{"componentChunkName":"component---src-templates-article-js","path":"/projects/cropperhead","result":{"data":{"markdownRemark":{"frontmatter":{"title":"CropperHead - auto cropper AI","date":"1.6.2020","cover":{"childImageSharp":{"fluid":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAPABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAwABBP/EABUBAQEAAAAAAAAAAAAAAAAAAAAB/9oADAMBAAIQAxAAAAHiUNg4qv/EABsQAAEFAQEAAAAAAAAAAAAAAAABAgMREhMh/9oACAEBAAEFAmxWc0rCDJPXy2mj/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPwE//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPwE//8QAFBABAAAAAAAAAAAAAAAAAAAAIP/aAAgBAQAGPwJf/8QAHBABAAMAAgMAAAAAAAAAAAAAAQARITFRYXGB/9oACAEBAAE/IR+43LyvMbnYgVDIvBvc92f/2gAMAwEAAgADAAAAEFjP/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPxA//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPxA//8QAHBABAAMBAAMBAAAAAAAAAAAAAQARITFBUXGB/9oACAEBAAE/EAA6081RHkLLHVzVE3PkSFOKNjhBqz8eoMA4c2f/2Q==","aspectRatio":1.3333333333333333,"src":"/static/d73cc241ed54b2a47959f8cb0bb77a3c/f422e/ch_cover.jpg","srcSet":"/static/d73cc241ed54b2a47959f8cb0bb77a3c/49b36/ch_cover.jpg 512w,\n/static/d73cc241ed54b2a47959f8cb0bb77a3c/f422e/ch_cover.jpg 640w","sizes":"(max-width: 640px) 100vw, 640px"}}}},"html":"<blockquote>\n<p>This is be a nice little post about a neural network.</p>\n<p>I like to periodically spend my free time behind the camera. Capturing moments, entities, phenomena and ideas is extremely rewarding, especially when you get that perfect shot. What isn't rewarding, however, is the time spent on post-processing the images; adjusting the abundant options, tweaking the plethora of configurations and trying to decide the perfect ensemble for a shot. </p>\n<p>This, plus a combination of my innate laziness and curiosity, inspired me to create a small neural network to assist with the process. </p>\n</blockquote>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 97.34042553191489%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAATCAYAAACQjC21AAAACXBIWXMAAA7DAAAOwwHHb6hkAAAFTklEQVQ4yx2UeVDUdRjGt7/K0mlq7FAndFIiSkyxQmssy9Sxcso8ydDU8FbSUExQlEOHS44KVIwF1F1FwNXi2BU5FpZdWGAX9o5dll1wlWPBxQOPnE8/e2fef9555/t93+d9nkc0OvqAnt5+Onu8dPb28fjRQzS1NVyTXcbW0cbNm15q1VYhLbhuDuLs7EKl1NJQo6Srq4s7Xi9uR5dQt3PbdxvR4wcP8N3qo93iRKPv4LZ3gOlvB/LyC8+zPmwleoOdmGwZyUWNqO29lEovsHf7Tk6mxdLcUo+8UklkXAbhvx5D1aZH5HG7iNy9g+UbtrDveDJDA32EzAlmdnAA6zatpk5RwbKVK4jNv4DGYSU5Lp7AyROZ+voYxOJsEn+T8NLc1YyduwJpZTUil8vFL1GRbI/cQ2HxRZ78+5ja2grk8hL07Wp02ibiUlNIuSRD53aiqqsj+tARdkfuo6mlWdjKRPq5UpILi+gQ1hY9vD+KxepE2/4PBpsDg6mTXUkSticVkV+uZqBvkEa1kSq5GrPNiavTRGFBGhelGRjNLfh6exiwGhm0mxm57UXkGxqitEJB3pVyZPVKYbpGZizdyty1B4jOOU+LpoXjySc5mpKLQqWl4vI53gwYw5R3nyE7LxVLWwdVlRWIC/Np1rUieorZ6fNiss6epqjqClUV5UyYFoD/7BAOJR3jcullps4IZsI775GeJ6ZGLiN4zlSCPvAjt/AUDXVKNm3byuTAaUhKLiC6e/cuNWrhZ6UGTYeJHrcbsVTKGYkUdWsz3U4n4gIxp09n06pvZdjbj1ZbT61SQc8NN1ajiXOFheTmnsLZ7UT05N49xL9nkBIbg0YuxWTs5NhvMiqq9Qz77lDd2MwPMbH8uHsXsrK/KJNdxW3v4mmMjHr5IzWOXRFLidizlpraa4iGPTdZNG8eqz4N5ui+jTQqVSxevoFtew9Qfl3OWWkxY96Yzrgp/mTm5nIk5iCzvljAJUUh7c7rAid/4jX/sbwy7XUKJAWIRgYG+XLJYr77JIS9P2+mrrqKwA9mMnfR52TlZVNccpWAj78maNG35JXKSBdwfe7V8UwJ8aNae5Gs1CQCgqYxbuJ4iq+WIHokKMViNmMwGLE7HIz4fJjMNuxdLoaGh7nh6ae+yYCm1Ux3jwfvQD8moddkMuMb8dHn6cVmMWKzmvA9lZ7DaiUhMY2UzJOc+P0P7vuGcQsNirIqGtRNVFbrWLIpi9CIM5Qqmv7XcnhkGks3x6G32clMTiTs+9UkpR7nH4dNeNBsYOv6ZWwJX01MzA7sNgu/7NnB92HriDoSi6rVxvxVEUwKDCEh/QTKa1WIXvRj3KQA2gQtJ8TH4zdlMpsifqapXYeox+Vm/6F4Ig4lkHrqT6wWM0u+XsDCrxZw+HicMKWGL5d9xfshQWSdzMBi0BMaFsqWbRsFOJy06PRExUSTJc4XHMsluM1DAUN7E5q267Tp1DwYHaXfO4ynf4CRe3cE3G5R1mDib5UZc7eHQe8djI4+Wq0ePEN3sei01AuHbFPVMHCrVyD2YD/hB2fy2Xd+hIUvxiNos6yyhgsl5ahatFypbGLWmiT8l8eTfL4Sg7mXtEvNxBXUUW+0IzknZeeu/YJZRNPY3CLQpq+foI/G4xf4LPMXfihcVMfiLUd5f81eYnIKBC62MmfNQYJCD5NeoqS+Rs7Ow/uZ9+03FBRLyJMU896Mt1i6/idUgp+K7o+MkCNOJC0nijP5mVgd3STkFrM/8ywSeS0GSxcZeZdIEbJWZ6VBMIuoEzlsiD6GQtOM4pqCHTs3k5SdQ6dghf8BQO9QJ1XnM/YAAAAASUVORK5CYII='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Training the CropperHead Network' title='Training the CropperHead Network' src='/static/e5aff9ea42745af38550244b1ccb4627/1d69c/ch_train.png' srcset='/static/e5aff9ea42745af38550244b1ccb4627/4dcb9/ch_train.png 188w,\n/static/e5aff9ea42745af38550244b1ccb4627/5ff7e/ch_train.png 375w,\n/static/e5aff9ea42745af38550244b1ccb4627/1d69c/ch_train.png 750w,\n/static/e5aff9ea42745af38550244b1ccb4627/c946b/ch_train.png 805w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span>  </p>\n<h2>Deciding what to work with</h2>\n<p>I realised that I could extract information on the photo edits from the metadata added to the photo files using <em>Exchangeable image file format (<a href=\"https://www.loc.gov/preservation/digital/formats/fdd/fdd000146.shtml\">Exif</a>)</em>, and use that as training data. To make matters a bit easier and the scope of the project a bit smaller, I decided to limit the edits to specifically cropping (and small rotational alignment).</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 65.42553191489361%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAYAAACpUE5eAAAACXBIWXMAAA7DAAAOwwHHb6hkAAABzElEQVQ4y1WTN65CQQxFXwVI5JxzpgCEKEiioQJRUbEPCnZAQcEGYK/+Olfyk35heeyx71yHCTabjY3HY2u1Wtbtdq3T6Viv17N+vy+N4EPwDYfDfza5y+VS/sPhYAHOdDpt8XjcUqmUJRIJ2dls1qLRqPycXXK5nNVqNYPIarWSXalUlAehYDAYWD6f10WpVNK5WCwq+Xq92uVykR9foVCwcrmsxNvtJlBIcJfJZMQ4BEQogwQuCfp+v3Y+n61arUrwN5tNMeSeh3iYHAQsARLYaDTUD5gS/Hg87P1+2+l0UtkkE1ev11UicQisHViA0OQCQFiQ/Hw+7fP5iB1+gIhBw4RqAPBWof+VDBCsYrGY3e93e71etl6v1St65GwQGDootg8QLUDGDVgkErHdbme/388Wi4XWwKeK+BlAHxqsAELzMCsWjEYjNRmZTCZ2PB4FRvN9gl4uIFSDzR0+B8UWQwApjeVkKACzoLTCd4/GO1MfgJfqTEOGs9ks/AH0hoD5fC6mDMT30nsGuJfvw3BAMWT3nD6BMKBPlM+ZV6nAk9GwcyE3mUxqO/i+wXa7tel0GrJEAKGH/lcpH1D/45zb7bY+AppY9H6/tz+j5l6eatFlXwAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='How to crop a shot?' title='How to crop a shot?' src='/static/0012cb54983af798f230b84a0aa23280/1d69c/cropping.png' srcset='/static/0012cb54983af798f230b84a0aa23280/4dcb9/cropping.png 188w,\n/static/0012cb54983af798f230b84a0aa23280/5ff7e/cropping.png 375w,\n/static/0012cb54983af798f230b84a0aa23280/1d69c/cropping.png 750w,\n/static/0012cb54983af798f230b84a0aa23280/4255a/cropping.png 915w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>Hmm, how did those golden ratio rules go again?</em></p>\n<h3>Extracting the EXIF</h3>\n<p>Firstly, I wrote a small python command line tool to easily crawl my photo library for possible images, i.e. all the photos that had been cropped. I explained this in more details in the <a href=\"/projects/exifextractor\">ExifAnnotator</a> post.</p>\n<p>Essentially, I could give the tool a path as an argument, and it would quickly find the photos which had been edited, resized them to a predefined resolution (while keeping the network training in mind) and put them into a predefined folder, as well as collected a nice json label-set for said photos. All good to go! Making my dataset was practically done! Moreover, should I need more photos, I could just run the script again for a separate folder.</p>\n<p>Furthermore, I could expand the edits by simply referring to the <em>Exif</em> notation and defining the values I wanted to extract. Sweet.</p>\n<h3>What is a crop?</h3>\n<p>Using the extracted data, we can see that a crop on an image is defined as four values, defining the limits from top, left, bottom and right. These limits are defined as fraction fo the image, i.e. 0-1, 0 being the leftmost and topmost edges, and 1 correspondingly the rightmost and bottommost edges. Additionally I included a value describing a rotation on the image, positive value describing a rotation clockwise, and negative anti-clockwise. </p>\n<table>\n<thead>\n<tr>\n<th>Metadata tag</th>\n<th>Example value</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><code class=\"language-text\">XMP:CropTop</code></td>\n<td>0.217628</td>\n</tr>\n<tr>\n<td><code class=\"language-text\">XMP:CropLeft</code></td>\n<td>0</td>\n</tr>\n<tr>\n<td><code class=\"language-text\">XMP:CropBottom</code></td>\n<td>0.9646</td>\n</tr>\n<tr>\n<td><code class=\"language-text\">XMP:CropRight</code></td>\n<td>1</td>\n</tr>\n<tr>\n<td><code class=\"language-text\">XMP:CropAngle</code></td>\n<td>-0.52</td>\n</tr>\n</tbody>\n</table>\n<p><em>An example of the crop defined by the metadata tags. The photo in question has been cropped in the vertical direction, and tilted slightly anti-clock-wise.</em></p>\n<p>Already we have the values which we want to predict, a vector containing five values. Excellent! Moreover, all the values are neatly within the range 0-1, and can be easily normalised to -0.5 - 0.5. to achieve zero-mean.</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 63.829787234042556%; position: relative; bottom: 0; left: 0; background-image: url('data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAANABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIBBf/EABUBAQEAAAAAAAAAAAAAAAAAAAID/9oADAMBAAIQAxAAAAHm1O0mUK//xAAYEAADAQEAAAAAAAAAAAAAAAAAARECEP/aAAgBAQABBQJQirxOUTP/xAAVEQEBAAAAAAAAAAAAAAAAAAAQIf/aAAgBAwEBPwGn/8QAFREBAQAAAAAAAAAAAAAAAAAAECH/2gAIAQIBAT8Bh//EABYQAAMAAAAAAAAAAAAAAAAAABAgQf/aAAgBAQAGPwIRP//EABgQAQEBAQEAAAAAAAAAAAAAAAEAESFB/9oACAEBAAE/IR7GGBjkqhGZU7f/2gAMAwEAAgADAAAAEFc//8QAFhEBAQEAAAAAAAAAAAAAAAAAABEB/9oACAEDAQE/EDLH/8QAFhEBAQEAAAAAAAAAAAAAAAAAAQAR/9oACAECAQE/EKpt/8QAGxABAAMBAAMAAAAAAAAAAAAAAQARIUExUWH/2gAIAQEAAT8QYOr2OSlkrpV1lIVd8sNKXYbEbgoILz7NiWsn/9k='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Crop factors' title='Crop factors' src='/static/41f33e66b87d990b0b3b5abd28454f31/acb04/cropvalues.jpg' srcset='/static/41f33e66b87d990b0b3b5abd28454f31/bc01b/cropvalues.jpg 188w,\n/static/41f33e66b87d990b0b3b5abd28454f31/bf173/cropvalues.jpg 375w,\n/static/41f33e66b87d990b0b3b5abd28454f31/acb04/cropvalues.jpg 750w,\n/static/41f33e66b87d990b0b3b5abd28454f31/16ad0/cropvalues.jpg 907w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>How do you crop your crop?</em></p>\n<hr>\n<h2>Modelling the network</h2>\n<p>Now, I really enjoy tackling problems head on. Thanks to that, I have ended up re-inventing the wheel several times over. Naturally, this is something I have learned to avoid by putting more effort in background research and utilising existing solutions. </p>\n<p>Nevertheless, I still find trying to solve a problem on my own extremely interesting - without any help from <em>\"how it is done the best way\"</em> -guides. Simply breaking the problem down and working on your own solution can help you gain knowledge and deeper understanding on the topic; knowledge that you would miss should you simply follow a \"proven\" solution.</p>\n<p>Naturally, such luxuries are rare in a world that demands fast pace and high quality. Since this was not the case on this project, I jumped straight in! </p>\n<h3>What sort of network should I use?</h3>\n<p>Naturally, as we are processing images, I chose to use a CNN. I started with a simple one containing only a few layers, eventually leading to a fully connected layer which would output the prediction.</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 25%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAFCAYAAABFA8wzAAAACXBIWXMAAAsSAAALEgHS3X78AAAAtUlEQVQY012QSQrFMAxDc/97FXqHkl03HdKRzoM+z+C/qME4sSVFcTjPU9d16XkeHcehYRg0z7PWddWyLOr7Xikl7fuubdvUdZ31wZRladxxHA3XNI1CURTKsswIJGACMiTALjBNk92p9KqqUl3Xuu/bBMkQY1Se538xwLhFEHLbtuYckvcQ58zv4OCSORHc2fu+JkQFyNlFnEAypzL/BrPgF3bGfnBDQGKfOPY987Dv0wW++QOO64OXyI9U/gAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='The essence of CNNs' title='The essence of CNNs' src='/static/10b0eed3780e9b9ed9803b4c9d28e47d/1d69c/CropperHead_CNN.png' srcset='/static/10b0eed3780e9b9ed9803b4c9d28e47d/4dcb9/CropperHead_CNN.png 188w,\n/static/10b0eed3780e9b9ed9803b4c9d28e47d/5ff7e/CropperHead_CNN.png 375w,\n/static/10b0eed3780e9b9ed9803b4c9d28e47d/1d69c/CropperHead_CNN.png 750w,\n/static/10b0eed3780e9b9ed9803b4c9d28e47d/78797/CropperHead_CNN.png 1125w,\n/static/10b0eed3780e9b9ed9803b4c9d28e47d/91608/CropperHead_CNN.png 1251w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>The very basic structure of any CNN: convolutions and subsamplings</em></p>\n<p>Along with the basic CNN building blocks, <strong>subsampling</strong> and <strong>convolutional</strong> layers, I eventually added <strong>batch normalisation</strong> and <strong>dropout</strong> layers to mitigate over-fitting. After all, I was using a relatively small set of training data. The final architecture of the network was obtained with trial and error: using <strong>Tensorboard</strong> as a tool to observe the performance of the network and the effect of the changes made.</p>\n<h3>How should I evaluate this</h3>\n<p>I went with the simplest initial plan there was: using a Square Mean Error (SME) for the predictions. I wanted to see how well I could make the network run without introducing anything fancy, such as area overlap used in many object detection methods.</p>\n<p><img src=\"/static/tb_ext-e41a2323e6e474ac3464bcb862faa2dc.gif\" alt=\"The values - they&#x27;re changing!\"><br>\n<em>Crops for unseen data during training for each completed epoch, using a custom Tensorboard callback</em></p>\n<h3>How did I set it up?</h3>\n<p>I had noticed the new release of TensorFlow 2.0 and was keen on trying it out. I set up some draft <strong>jupyter notebooks</strong> with <strong>TensorFlow</strong> with <strong>CUDA</strong> acceleration installed, and started developing. I also whipped up a small script to visualise the progress on evaluation data in <strong>TensorBoard</strong>.</p>\n<p>Then, I just started iterating.</p>\n<hr>\n<h2>Training the network</h2>\n<blockquote>\n<p>Now, first I need to address the issue of the network design. Considering the challenge and my lack of experience on CNN for such applications, I didn't go for any specific architecture. I started reading basic articles on image processing and modified the network accordingly. In other words, this could had been much more sophisticated and probably will make a couple sworn deep learning peeps pull a healthy handful of hair out of their head.</p>\n</blockquote>\n<p>Next, I'll describe a few problems I ran into and how I tried to fix them.</p>\n<h3>Problem with the limits</h3>\n<p>So, it turns out that the network thinks that there is interesting stuff happening outside the photo boundaries as well. </p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 290px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 43.61702127659575%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAACXBIWXMAAA7DAAAOwwHHb6hkAAACcUlEQVQoz1XSSUjUARTH8X9KuW+jTrnkkprjuKbTalKHFjCzFSyEiqAocYsQi6JLh+jQQbGDxESD6cF1tHFLR01zGWd0dFwyFxDKLc0ltcXIb4Oi0YN3/bwf7z1B1z1I59AY+uFJtMPTNPaNU6sfpbT5IzVdAyyvLLFRwwYD70qK6VCrea9S0VqhQlNVQVNZKZraKqZnJhBa69oxfPhEm26IupYBymv1FKrakOeqyatsYP7HHKur62B6wm3MBAFve1vcLMzws7XA19jBImvCfZwory9GmBmfZHZ2jqmJCbQdvdQ2aOgx9KDXtNPR1fIfmJqcjGAELa2t2WJqgtjRnh1iEV6uYqRSD5T1SoSp8THmlxeYmvuKrm8AbbeBrt5etJ16GjSNzH3/B95JSloDzc0tiI09w7GjUTiL7LGxtMLNQ0xpXQnCl5EeZqZG6e7R0z/YT7OuA5W6iezXb1CUqvj2a54/GwkT10EHBxHx8Ze4FneaoN2e2FhZ4bHLhYK3hQj9DdXUVFWTkSUnU57Pk+e53H+aTeLDDJ4p8v4D76amrIFisRMxsdFciD2OVOKJh6sz0mAvitXGHfr7h+MbegCf4H2EHD5BdNx1Dp66zM6wI5y7eYul34ub4KMH6bhsNx5gjx9K42VTEq4iC/Nmf5gPkZESypvKjAPNBATHbQgOWxHsTDERWyLydycgSkbi43ss/FzYBNPSEjG3EwiJcOfKjYtEx+xFEigiMMgJ2SF3SuoLjAklvgREhBIoC0Nq7IDwEHyDpZw8e55MxUsWVxY3/7C1pYkX8iwKi3KorC5Dqcwjv0BBfv4rikpyGPk8zF/JK9YOHIgK4wAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Crops breaking the four walls?' title='Crops breaking the four walls?' src='/static/02656ff7eea97e6ede0cd57b51341c32/139a5/out_of_bounds.png' srcset='/static/02656ff7eea97e6ede0cd57b51341c32/4dcb9/out_of_bounds.png 188w,\n/static/02656ff7eea97e6ede0cd57b51341c32/139a5/out_of_bounds.png 290w' sizes='(max-width: 290px) 100vw, 290px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>The CNN low-key trying to hint that you should invest in a wide-angle lens</em></p>\n<p>As technically the values for the prediction were not limited in any manner, they soon went ballistic. Initially I had hoped that the training data would be enough to limit the values but obviously not. Hence I figured there are three things that I could do:</p>\n<ol>\n<li>Add an extra error term which would punish values beyond [-1,1]</li>\n<li>Add more data</li>\n</ol>\n<p>As getting more samples was a limited option, I tried to create a custom loss function which would, by some scaling, add error if the values were outside the set limits.</p>\n<div class=\"gatsby-highlight\" data-language=\"py\"><pre style=\"counter-reset: linenumber NaN\" class=\"language-py line-numbers\"><code class=\"language-py\">    <span class=\"token comment\"># Clip constraint:</span>\n    clip_constraint <span class=\"token operator\">=</span> BK<span class=\"token punctuation\">.</span><span class=\"token builtin\">sum</span><span class=\"token punctuation\">(</span>BK<span class=\"token punctuation\">.</span>square<span class=\"token punctuation\">(</span>y_pred <span class=\"token operator\">*</span> BK<span class=\"token punctuation\">.</span>cast<span class=\"token punctuation\">(</span>tf<span class=\"token punctuation\">.</span>math<span class=\"token punctuation\">.</span>logical_or<span class=\"token punctuation\">(</span>\n            BK<span class=\"token punctuation\">.</span>less_equal<span class=\"token punctuation\">(</span>y_pred<span class=\"token punctuation\">,</span> <span class=\"token number\">0.0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> BK<span class=\"token punctuation\">.</span>greater_equal<span class=\"token punctuation\">(</span>y_pred<span class=\"token punctuation\">,</span> <span class=\"token number\">1.0</span><span class=\"token punctuation\">)</span>\n    <span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> BK<span class=\"token punctuation\">.</span>floatx<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> axis<span class=\"token operator\">=</span><span class=\"token operator\">-</span><span class=\"token number\">1</span><span class=\"token punctuation\">)</span></code><span aria-hidden=\"true\" class=\"line-numbers-rows\" style=\"white-space: normal; width: auto; left: 0;\"><span></span><span></span><span></span><span></span></span></pre></div>\n<p>*Snippet of the custom loss. <code class=\"language-text\">BK</code> is the <strong>Keras backend</strong>. Full module can be found in the <a href=\"\">GitHub repository</a>*</p>\n<p>This, however did not yield any better results. Moreover, I could simply clip the resulting values to a certain limits after obtaining the result, so I did not spend too much time thinking of improvements. Maybe it would be beneficial to circle back to this problem in the future, as I find many of real world problems are innately somehow limited. </p>\n<h3>Left is right, up is down</h3>\n<p>Another problem arising from how I used the predictions was that the values could overlap, making the area non-existent. This, however, was a relatively rare case and seemed to go away with more training iterations. </p>\n<p>I wonder, though, would the use of another type of error criterion help with this?</p>\n<h3>Overfit us</h3>\n<p>On practically every initial run, the graphs showing the accuracy and loss on familiar and unseen data fed to the network were not that satisfactory of a shape: the network was clearly overfitting the data... No wonder the photos looked so nice in the end.</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 741px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 52.659574468085104%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAA7DAAAOwwHHb6hkAAABUElEQVQoz51SXXPDIAzL//+XW/eydkmalASwAU02XdftuocudzoUPoRsMYQQkFKCf61dh/ZvDCICQ6sVSQqmLbtwaxW1PoeroFJICMXxkhwnYkuCUsrzgkqxS8w4rhFrpFOWfEmKtyVh3MRbUO4P3i5pqKqE9PHmMGe6yTicI2Ab405sUI6ncUYcT6jbihoWlMsZZZlQ1pm8/9uactQ9eASDUD1tAfuy+KJv4oHGDTsDG6czQuwtYKPR6Ig2SYv33ZyJFijRHVJQc0KlQJWMykkvxwNv3obXeWf5rCQrAtuxcM64wfac94wonQ85M2WKlnoVuvboK5DMmy0wc2gHZwp/BPY3ZH8RI/nLtPnF8FBYggXTBf5O1datZC+7dijn3tdE19IDNcEodEEHdsCg1oI7yD2Xn9zer/GindvFw2Ep2LV5/Q9ff/3m9QH//Q4/ARFmYLlsXZxIAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Accurate overfitting' title='Accurate overfitting' src='/static/6a0743052140bb833676b55c4288b247/13e20/overfit.png' srcset='/static/6a0743052140bb833676b55c4288b247/4dcb9/overfit.png 188w,\n/static/6a0743052140bb833676b55c4288b247/5ff7e/overfit.png 375w,\n/static/6a0743052140bb833676b55c4288b247/13e20/overfit.png 741w' sizes='(max-width: 741px) 100vw, 741px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>CNN: \"If something is worth fitting, it is worth overfitting\" Orange being the training data, blue the evaluation data.</em></p>\n<p>To combat this, I added some <strong>dropout</strong> and <strong>batch normalisation</strong> layers to my network, as well as made it deeper and more narrow. However, nothing beats having more data. </p>\n<h3>Homogenetic crops</h3>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 66.48936170212765%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAYAAACpUE5eAAAACXBIWXMAAAsTAAALEwEAmpwYAAACTUlEQVQ4y21Ty2oUQRTtX3btyp0u3AhuJOADxIUgChERxYhJfMRkQtTEjISZ9Dy6e/rdVf2Y7kk4nls1MyTi4tBVdW+duufc247r5dg5GKHSDZKkRBRrxIk2a0HMvdZzZHkNPyiQpuU6dhWSp1QD50ffx8aLI0y8DONJBpVXiCKFMLQISJKRJAgLuOPUxGaz4hpWeTnvOv1BhBt3P+Dm/W3cerCL91+GOB+lJiEIcvNNDaGCy/NoeXk2u04suVlGwoClPn1zgsebv/Dk9TE2Xh5ha+8c/WGMsZ+bV+u6Q0rJHi8WRW2QppoEJfLlXvKqqoXTtgu0845oseguUFc1vaig6GnTdJhRYsyXu26Bspwb1HVLktKgquxeM7/i1xF2n5WsyrZSlDnLWcH2vouP3120TQvPy6/ItB6vPJxOM3i0wklTtQ4K4sgioRWVqrHbG9GCIZqqMV0Ow2JNtCITXz0WID47IiFgwHRPjGcnBYfHHl5t/cHtR1+x99NDR1tsnja5FhqhFMDHJ1IhC6OHnfFA0PDSNCDhJMKEwcGEa7+AKhtcXlzaHPo6Z55SbEiuuF7wrEXJhhgPpUJ5SV4x48Eqnr87NWeaknM2xBhO8w9PfMYV0uUg/wstg209tH6IDJF779m+HeLlzMWxQqmtnz1aIf4a//432NJlO6iFIbCEB+u/QrpdMDFkxXcefjM/wOlZyMZpO/zrwRbCGo7oF1KRJfLdacKuDpBQfknvJCYz6pN889MZ3n4eoPc7wGAUc/hDRJSf8cG8qIwtfwEuXclq7rdrdgAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Farmers&#39; tip: diversify your crops' title='Farmers&#39; tip: diversify your crops' src='/static/5753833e8f5b6671d2894493c2f78903/1d69c/crop_dist.png' srcset='/static/5753833e8f5b6671d2894493c2f78903/4dcb9/crop_dist.png 188w,\n/static/5753833e8f5b6671d2894493c2f78903/5ff7e/crop_dist.png 375w,\n/static/5753833e8f5b6671d2894493c2f78903/1d69c/crop_dist.png 750w,\n/static/5753833e8f5b6671d2894493c2f78903/d53ff/crop_dist.png 1068w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span><br>\n<em>The crop values used in training heavily slanted statistically, causing possible probability bias to predictions.</em></p>\n<p>Additionally, the data was quite heavily slanted towards crops on the top left corner, and rather large areas instead of a clear, small areas. This was naturally the result of my cropping style, so it was also interesting to learn a bit about myself. However, some data augmentation could had been added to normalise the distributions of the training data fed to the network, such as horizontal flipping. </p>\n<h3>Final results</h3>\n<p>After numerous iterations on the network architecture and loss functions, I ended up with the following architecture: </p>\n<div class=\"gatsby-highlight\" data-language=\"py\"><pre style=\"counter-reset: linenumber NaN\" class=\"language-py line-numbers\"><code class=\"language-py\">Model<span class=\"token punctuation\">:</span> <span class=\"token string\">\"sequential_1\"</span>\n_________________________________________________________________\nLayer <span class=\"token punctuation\">(</span><span class=\"token builtin\">type</span><span class=\"token punctuation\">)</span>                 Output Shape              Param <span class=\"token comment\">#   </span>\n<span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">=</span>\nconv2d_6 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>            <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">298</span><span class=\"token punctuation\">,</span> <span class=\"token number\">298</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>      <span class=\"token number\">1792</span>      \n_________________________________________________________________\nmax_pooling2d_4 <span class=\"token punctuation\">(</span>MaxPooling2 <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">149</span><span class=\"token punctuation\">,</span> <span class=\"token number\">149</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>      <span class=\"token number\">0</span>         \n_________________________________________________________________\nconv2d_7 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>            <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">147</span><span class=\"token punctuation\">,</span> <span class=\"token number\">147</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>      <span class=\"token number\">36928</span>     \n_________________________________________________________________\nmax_pooling2d_5 <span class=\"token punctuation\">(</span>MaxPooling2 <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">73</span><span class=\"token punctuation\">,</span> <span class=\"token number\">73</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">0</span>         \n_________________________________________________________________\ndropout_2 <span class=\"token punctuation\">(</span>Dropout<span class=\"token punctuation\">)</span>          <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">73</span><span class=\"token punctuation\">,</span> <span class=\"token number\">73</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">0</span>         \n_________________________________________________________________\nconv2d_8 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>            <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">71</span><span class=\"token punctuation\">,</span> <span class=\"token number\">71</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">36928</span>     \n_________________________________________________________________\nmax_pooling2d_6 <span class=\"token punctuation\">(</span>MaxPooling2 <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">35</span><span class=\"token punctuation\">,</span> <span class=\"token number\">35</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">0</span>         \n_________________________________________________________________\nconv2d_9 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>            <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">33</span><span class=\"token punctuation\">,</span> <span class=\"token number\">33</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">36928</span>     \n_________________________________________________________________\nmax_pooling2d_7 <span class=\"token punctuation\">(</span>MaxPooling2 <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">16</span><span class=\"token punctuation\">,</span> <span class=\"token number\">16</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">0</span>         \n_________________________________________________________________\ndropout_3 <span class=\"token punctuation\">(</span>Dropout<span class=\"token punctuation\">)</span>          <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">16</span><span class=\"token punctuation\">,</span> <span class=\"token number\">16</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">0</span>         \n_________________________________________________________________\nconv2d_10 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>           <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">14</span><span class=\"token punctuation\">,</span> <span class=\"token number\">14</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">36928</span>     \n_________________________________________________________________\nconv2d_11 <span class=\"token punctuation\">(</span>Conv2D<span class=\"token punctuation\">)</span>           <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">14</span><span class=\"token punctuation\">,</span> <span class=\"token number\">14</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>        <span class=\"token number\">4160</span>      \n_________________________________________________________________\nflatten_1 <span class=\"token punctuation\">(</span>Flatten<span class=\"token punctuation\">)</span>          <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">12544</span><span class=\"token punctuation\">)</span>             <span class=\"token number\">0</span>         \n_________________________________________________________________\ndense_3 <span class=\"token punctuation\">(</span>Dense<span class=\"token punctuation\">)</span>              <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">64</span><span class=\"token punctuation\">)</span>                <span class=\"token number\">802880</span>    \n_________________________________________________________________\ndense_4 <span class=\"token punctuation\">(</span>Dense<span class=\"token punctuation\">)</span>              <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">32</span><span class=\"token punctuation\">)</span>                <span class=\"token number\">2080</span>      \n_________________________________________________________________\ndense_5 <span class=\"token punctuation\">(</span>Dense<span class=\"token punctuation\">)</span>              <span class=\"token punctuation\">(</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span> <span class=\"token number\">5</span><span class=\"token punctuation\">)</span>                 <span class=\"token number\">165</span>       \n<span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">==</span><span class=\"token operator\">=</span>\nTotal params<span class=\"token punctuation\">:</span> <span class=\"token number\">958</span><span class=\"token punctuation\">,</span><span class=\"token number\">789</span>\nTrainable params<span class=\"token punctuation\">:</span> <span class=\"token number\">958</span><span class=\"token punctuation\">,</span><span class=\"token number\">789</span>\nNon<span class=\"token operator\">-</span>trainable params<span class=\"token punctuation\">:</span> <span class=\"token number\">0</span>\n_________________________________________________________________</code><span aria-hidden=\"true\" class=\"line-numbers-rows\" style=\"white-space: normal; width: auto; left: 0;\"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></pre></div>\n<p><em>The final architecture of the network</em></p>\n<p>For loss, I found the basic SME to work best. Generally, the resulting crops seemed sensible, and more importantly, deterministic based on the input image. I was satisfied, considering my skills and the time I had allocated for this project in between my studying and work. </p>\n<p>However, as one can obviously tell, the resulting crops are rather, well, open for interpretation: had the network really learnt to crop the images, or simply pseudo-randomly sprouted predictions, using the input image as a seed and the training crop distributions as the value pool? Especially, the values seemingly restricted to [|0.6, 1|], practically any random crop could be interpreted by a user as a \"sensible crop\", especially if the image was already at least somehow sensibly cropped to begin with.</p>\n<p>It would be indeed interesting to have a comparison of the different methods: one image, cropped by a human, a trained CNN and a random generator with sensible limits. Would a person be able to tell a difference? What would it prefer? Could they be improved? Could any of these systems learn from each other? Is there something to be learnt from the expression \"beauty is in th eye of the beholder?\" </p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 79.7872340425532%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAEiUlEQVQ4yx2TWVCTBxSFf512dNSiiFQqKqtht0FAJYhsSqsoLkSRxREKSKNUwA2VHSSAVDHIFpbKYl0AMQRJAsSioFFRQNSi1A5YGbWtTn1o+9Cnr2kf7sOZuXPmmznnCDrtXa5e6eNaaw/ynFwK8/LIyswgMyODk/kFHEs7zsDAAD0djRTkpqEsLeCUPJdieQ75uTmcPVWIQnGG8qpKnr8YRziceo49OwtQZDVzKDoJi9nzmCFMY+b0aSwwmY8gCJSWniE5dh22pgL+zuZYLDDhs/kf42xtgqf9Ivy9P0ciEaO+rkbIy2lEllBG3Wk1/Zd6uSgvYon5QuysrLFZasvsWfOorVGSuDsMVysTQiUivOytcHVciHSjE7HBqwjx9cBjuQ2aLhVCduZ5ZPFlKORtGK7cZOpGD1EbvXFYaomLjYhP55pRo1QiMxq6G4mCxIuReloTE+jI8iXzSd6ymqTwQObMnknb1RaEHKPhPiPhuaJ2bjTq4NmPqCrSsba0YKWTK3YWllRWKclLiWNfoD0Hgt1I8HdF5uvKN2udacqNpSIjhkgjRH9fN0J+ThMHZJWUl6j44aKeYZUKmyWLsLRcjL2tA6ZGwurqamJ3hGJjPo9VIitCVojY5u3GEakPquJExtRnedlXz1/vJhEOHSwnKvwk8uxL1Bc3EB4UzEfTBCzM5mJmavZ/KMUlJURv+wKPpaZYms5hzozpOFrOxcthMetX2FOwP4wLRft5O/EYYaD/Cb3aIR49mEDfeRNdSzvXmpuoqayiTlmH4kwZQ0PDxp/rtJyvprG2kqpyBd9Vn+O8soLqijKaaitoba7j3a9TCIZBLRev1dCiaaDz1mX6HnYz/OIedx4aGLh/l94HOsYnxjD09XJrcIy7P/1J39M/0D9+T/foe3SjH9CO/E7ngykm3nww1qZqP5v2erAzKZCE9O0czIshOSeCtNMxZJQcJ7kwmp7bbRSlSNkVKqH6cj8d9z/QeusVF7RP+V73jEbNKOUtBp68eIOgVGWwNz+Q3Yd92SHzJeGYlISMjRxVRFLckMKe4/60aJWk75Pi6jCLK5eaeTtu4Lamlddv/+HR0zfcG5lEpR9l8tVvCJUdx0gtCiX12xBkxcHEZm8gpXQ7OXWxRJ4IYs0uG5rby0mN28oyh0/YstWHqG1uRG6wR9NST03dBTKzslAPvmbq/d8IDZpsEuU+HK34ktz6KGSF60jMX8/X8iBiT/qwKWkZl9WVJEZtws56Bk7OFoiM5+5iTpjEHC9XY40CXFAUn2B8zJhyel0c0nQxR0/FECHbTFRqAJGHPYlNW0187hp2ZbnR2q0kJmIHomUWrBCLkHi74LHSmdWejtja2bDezwtfsT1dHe1Gw4o4wk948NWRzUijfUlJDiMoUoQ0xZWIQ2JCom1pa6slIT4esbszPgESVgb4Ifbzw3PtZtwl6xD7eGPn4GSc3lWE55PDGEa09N1TMfSol/sjGgb+03e66DfojHPq4tXkzwwaK6TRqNHpdVzv0dKl09PT04+uW0+3vpfOTjW/vJzkXziY/pFCMs9KAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Prediction 3' title='Prediction 3' src='/static/8bedf6fc2bfaf0afe4e941873b7e95bb/1d69c/pred_collage.png' srcset='/static/8bedf6fc2bfaf0afe4e941873b7e95bb/4dcb9/pred_collage.png 188w,\n/static/8bedf6fc2bfaf0afe4e941873b7e95bb/5ff7e/pred_collage.png 375w,\n/static/8bedf6fc2bfaf0afe4e941873b7e95bb/1d69c/pred_collage.png 750w,\n/static/8bedf6fc2bfaf0afe4e941873b7e95bb/00d43/pred_collage.png 1000w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span>\n<em>Some example predictions.</em></p>\n<hr>\n<h2>Time to go public</h2>\n<p>Naturally I wanted to show off with this cool little toy I had created. I did not want to drag my laptop around or record hours on my screen - nah, I'd rather whip up a simple tool to let people play around with the network. Sharing is caring after all, right?</p>\n<p>I was also interested in the at the time recent <a href=\"https://www.tensorflow.org/js\"><code class=\"language-text\">TensorFlow.js</code> (TensorFlow JavaScript)</a> library. Apparently with this, it would be rather straightforward to add your networks to web applications.</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 71.27659574468085%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAAA7DAAAOwwHHb6hkAAADQElEQVQ4y62T3U9bZRjA+ycYM42JJliB0paPQltgHYyPBU6/zzntOT2Uj9LN2jIqG6LtJlDsZIwKQS/I3NgHywh0ihfzYomGK5PNeTkTMHFApmLURXfjf/DzlOFH4oU3Xvzy5H3eN7/3ed48ryHc3YTP3UW7INHa6abWeYS6xpZ/0dDchv1wO3bXAYfb/qKhuQWnq41OdwBDwCughkOoEe2AHjQtSuRP9HUp+v1But1eugTPPoLHh9vrR/AGEAIqgl/B4wti0JQwX32+zoP7d3Q2uH1rleLNa3yydpP1tRWWL1+keOMqJ/qiNDntdBxtIaKqJJNJ+no1lIFh+i7cQ5n8jECoF0NvROWL2yvcWb/I/Y2PODs2Qq8WJnsqw2j6dVKJ4wynXsPlsPP8oUNYzGa9iwivJhJoiohXOYE6/SXh3Aa+UD8GRRJJ9YskTgb48PoZhoY14gMq87lrHO+J06+GGNQi2C0Wnnv2GWw2K6GwghyOIKr9iD0JgtEkPiVOIChhkPw+/N4mpL4OMuMxZqbe4J0zs8zm3iM3niKhKaQiIY41OrBUGGk90kg4JOt5N0MzH5O+8SNDV7ZJLH6NFIk9rXDh3RHeTGsUskmWCgUW8gWuFOZYvbTIYmGK+bNZ1K5Ommy1uJpdpAZkvr10jN17RR798DPfPdrlm60tYjFdGFU11j64wKlelamRLPnMBOczc0ymJ5gdn2Tl6hzFy++jCB1YjGU4GhwMxUI8XPKwfbfIzvc/sbuzzebm5lOhHBTJJE8z/9Y8hbEl3j45w9ToJOeyYxRyoxQyp1leyBOTujG+9AJOu3NfuLMs8/DuLXYPhFt6hYODgxg8Hjc1VeVUG1+hrrKK+kozDnMVtqoK6q0VNNtqcFZb9Fw5prIXaah3kI7LPF4N8vjBp/zy5Hee/PYre3t7xONxDF6vF7PJRI3VSn1tDdVmE5XGl6nQ2zOVl2E1VWCz6kL9/Rrr62g/2ro/LueGA0xPjDI9M8v56Wny+TyKomAQRZGeaHR/tlR9YEsxrG9Isoz8DyRJRpQkPUoERQnBp6P/HEHo1hH2CQT0rxcMBilJpYPDJUrr/0KWSkh/X6hTchlK1v+TPwATME6f5ONSXAAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='UI for the people!' title='UI for the people!' src='/static/58980037f088b5305858e5d2763d3aa9/1d69c/ch_ui.png' srcset='/static/58980037f088b5305858e5d2763d3aa9/4dcb9/ch_ui.png 188w,\n/static/58980037f088b5305858e5d2763d3aa9/5ff7e/ch_ui.png 375w,\n/static/58980037f088b5305858e5d2763d3aa9/1d69c/ch_ui.png 750w,\n/static/58980037f088b5305858e5d2763d3aa9/75609/ch_ui.png 994w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span>\n<em>The UI for the CropperHead \"Service\"</em></p>\n<p>By following the few tutorials available on <code class=\"language-text\">tfjs</code> website, I managed to pull together a rather decent looking website. Simple enough, no extra shinies and included all the necessary funcitonality: the user could select a photo from their computer to be analysed and see the estimated values. I used <a href=\"https://fengyuanchen.github.io/cropperjs/\">`cropper.js</a> for displaying the estimated crop on the web application. However, as the <code class=\"language-text\">cropper.js</code> had such nice functionality for cropping the photo, I was inspired...</p>\n<h2>Gimme more data!</h2>\n<p>As quite obvious from the text above, the implementation of the network was not by any means optimal. On top of the shortcomings I already listed, one major flaw also existed: lack of training data. Despite my efforts of crawling through the photo library I accumulated over the years, as well as trying to crowd-source data collection to my friends who also enjoy photography, I only managed to collate around 300 photos. Considering the problem, I would estimate to need around ten times more... [LINKS TO ARGUMENT] So, how to get it?</p>\n<h3>Not happy? Then show me how!</h3>\n<p>The obvious solution to collect the data in form and without constraints, would be to utilise the UI! As people would be able to add their own photos, and <code class=\"language-text\">cropper.js</code> supported cropping on the UI, it would be quite convenient to combine these two for a pipeline to collect the much needed data. A simple block of text assuring our benevolent means and needs, along with a hint of instructions, and a coule of buttons later, the UI was ready to process the preferred crops of the users!</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 750px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 51.06382978723405%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAKCAYAAAC0VX7mAAAACXBIWXMAAA7DAAAOwwHHb6hkAAABb0lEQVQoz3WSSVLDMBBFcxZ27EIGx1PkUZJlyVNSxQaKA3ACjsJtP91KHByKLF6+Jat//5az6vse0+mMYZww9RbTOKAfRljXwRH2H1z3mFVrHUSlEVcOT2/feB6+EERH7OL8xp5JiofwmcOxhCWvlWktklwiKRps9TuC+uzXaaFJFdJS0+EK2yi7a7Lj9XWP3wVk7A0b0yIUNZSbYGwH40bIdkQurScmc07BRUuW5t4wXRhykrodUJsB0o6kPUrdodAOVdND1IaMOW3jiTJ5Z3pnaNrWH2A4CUfnEYO09PDdhKLyzMXMJhTErGJhSAm5u7ITmu4ETaPrWd1lT1FqplAOx6rxDTkAXxWrb54Uvx8lV9aPxSkiOnyIM0T8nFEBFYXzMxXznXq9TsXry0QLQ94IBXWkTlvZYf3xiXWmsQlSvNA4Nw5/CH91n+QXQ/4pZYNSGa95Q4mnVwhpkBUSWakI+QAFQVrpFkWt6U/v8AOxImHgUzZjpQAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='The instructions for data mining?' title='The instructions for data mining?' src='/static/fea111c7755c914b57effabffe0ac4d4/1d69c/ch_ui_2.png' srcset='/static/fea111c7755c914b57effabffe0ac4d4/4dcb9/ch_ui_2.png 188w,\n/static/fea111c7755c914b57effabffe0ac4d4/5ff7e/ch_ui_2.png 375w,\n/static/fea111c7755c914b57effabffe0ac4d4/1d69c/ch_ui_2.png 750w,\n/static/fea111c7755c914b57effabffe0ac4d4/75609/ch_ui_2.png 994w' sizes='(max-width: 750px) 100vw, 750px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span>\n<em>How to tell me how to do my job</em></p>\n<h3>Where are the data go to?</h3>\n<p>The UI side was fixed, bu what about the server side? How, where and when were the photos and labels processed? As I had some previous experience using <code class=\"language-text\">mongoDB</code>, <code class=\"language-text\">AWS</code> and microservices, I implemented a (not so simple) scalable data collection service - <a href=\"/projects/datalibrarian\"><code class=\"language-text\">DataLibrarian</code></a>! With it, I could easily set up a service which would nicely route the photos to a <em>AWS S3</em> bucket, the lables to <em>mongoDB</em> and keep a record of how they were paired. YOu can read more about the project in its dedicated project description.</p>\n<p><span class='gatsby-resp-image-wrapper' style='position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 640px; '>\n      <span class='gatsby-resp-image-background-image' style=\"padding-bottom: 79.25531914893618%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAAAsSAAALEgHS3X78AAAB70lEQVQ4y41Ti27CMAzk/79sD6GJSdskoFBRaNqUpu8XpXg+lyCYQCxS1Caxz+ezPem6Ax0OB8Jq25aappFdlpXc9X1PVVVR13UUx4Zgj/fZ5xd9/8zl3e7j8UgTrTUbxrTZbEkpRUmSyPY8n7IsJ99XVNe1gO73htI0EwJt28lumpbfRxJ4mygV0HaraOcHFIRamBVlyQF29Pb+QYuFI4BwhE3KQWCjoz0HiDlQLWdkAOAJ0jqdTpIeIhdFJRvRwRCGeV5SFMXMPKNhGCS9+dwh1/XEF37IAIEZcAQbhvF7b8Ep1BEzr+6+QzukLIAwtgxtQcBKqZD1C0gFITPMRVewQGDrY/0AiHTPDOkGEE4lawhdYZTnxbnqJRmTsvNwYf0UEGDjrpldKCABM0SRAApAK82/AEsGQxVRBKT88jql5XLNxSkYWMtdy3Jc+9wF/Gtgl60qVsRtMp3OpF3swiDYlrL/Nwyvxb4OgOjGJGSSlHCNBl6tXJGk74/cv9wBxSjLU0BERZWhbSSNXMkIAhST4zhrPicykjcp3+s9W3m0TJaloiMKBs0ubwzcn88I9hQQ2iAdpK11LGfo6Xk7bq0tM03568v0YLKeAuILMDDDOI4j2su/MYbHb8NSGBk/7IeAjyp/fYemXzqusLPrF12W2xmml4jVAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"></span>\n  <img class='gatsby-resp-image-image' alt='Gotta love them flowcharts?' title='Gotta love them flowcharts?' src='/static/c6e26b14407fe36cb0d900a1fd7cb3b9/6af66/system.png' srcset='/static/c6e26b14407fe36cb0d900a1fd7cb3b9/4dcb9/system.png 188w,\n/static/c6e26b14407fe36cb0d900a1fd7cb3b9/5ff7e/system.png 375w,\n/static/c6e26b14407fe36cb0d900a1fd7cb3b9/6af66/system.png 640w' sizes='(max-width: 640px) 100vw, 640px' style='width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;' loading='lazy'>\n    </span>\n<em>Simplified view of the entire project architecture</em></p>\n<h2>All is well that ends well</h2>\n<p>So, with the few extra hours I had for myself after work, I had managed to effectively collect training data by extracting crop information from <strong>Exif</strong>, create and train a <strong>Convolutional Neural Network</strong> to crop (and slightly rotate) photos automatically, design and publish a web application where users all around the world could try out the performance of the network, along with a method and infrastructure for collecting more training data to further improve my model! </p>\n<h3>What could I improve?</h3>\n<p>I am rather happy with the overall result. All of the individual block can naturally be improved (quite a lot) but also the entire approach could be different. For example, the following might be interesting changes:</p>\n<ul>\n<li>Using object identification to analyse what to include to the crop</li>\n<li>Using a predefined crop factor</li>\n<li>Using a different loss, such as Intersection over Union (IoU)</li>\n</ul>\n<p> Nevertheless, the entire project served its purpose as an entertaining way to try and learn new. Implementing such a solution from start to finish hopefully helps me to extensively learn and communicate with appropriate professionals, and concentrate on more pressing issues. Such wonderful time!</p>"}},"pageContext":{"slug":"cropperhead","navContext":{"next":{"path":"/projects/exifextractor","title":"ExifExtractor - Find your edits","slug":"exifextractor","links":["https://www.github.com/PebbleBonk/ExifAnnotator"]},"prev":{"path":"/projects/texmindmapper","title":"TexMindMapper - visualise your thesis","slug":"texmindmapper","links":["https://www.github.com/PebbleBonk/TexMindMapper","https://tex-mind-mapper.herokuapp.com"]}},"links":["https://www.github.com/PebbleBonk/CropperHead","https://www.github.com/PebbleBonk/CropperHeadUI","https://cropper-head.herokuapp.com"]}}}