Jekyll2019-06-18T07:43:56+00:00https://krisztiankovacs.com/feed.xmlKrisztian KovacsTeaching computers to learn.Object Localization Using Saliency Maps2019-04-12T02:11:44+00:002019-04-12T02:11:44+00:00https://krisztiankovacs.com/object-detection-using-saliency-maps<p>I’m participating in <a href="https://fellowship.ai/">fellowship.ai</a>; and one the projects I was working on involved object localization. Specifically, how to do it if we only have class labels (no bounding box or segmentation info).</p>
<p>Turns out we can use class activation maps, using techniques such as <a href="https://arxiv.org/abs/1610.02391">GradCAM</a> or <a href="https://arxiv.org/abs/1710.11063">GradCAM++</a> to obtain a map from any classifier; which can then be converted into a bounding box.</p>
<p>See our <a href="https://platform.ai/blog/page/9/attention-cropping-in-platform-ai/">team’s article</a> describing the approach in more detail.</p>krisztiankovacsI’m participating in fellowship.ai; and one the projects I was working on involved object localization. Specifically, how to do it if we only have class labels (no bounding box or segmentation info).Tips and Tricks from Andrew Ng’s class2018-10-18T02:11:44+00:002018-10-18T02:11:44+00:00https://krisztiankovacs.com/andrew-ng-tips-and-tricks.<p>Andrew Ng mentioned lots of tips and tricks during his <a href="/review-blitzing-through-the-coursera-dl-specialization/">Coursera class</a>. Below I list a miscellaneous subset. Many are not even DL specific. They are in no particular order; some people might find them obvious (especially with hindsight). I found them helpful and interesting.</p>
<h3 id="random-search-beats-grid-search">Random Search beats Grid Search</h3>
<p>We often have to optimize multiple hyperparameters. For example, suppose you have a standard feed-forward neural net, with 1 hidden layer. You are trying to decide on the number of hidden units, and the amount of dropout. You have resources to fit multiple models, and compare their results on your validation set.</p>
<p>One way to search the hyperparameter space is grid-search: run all combinations of a discrete subset of parameters. For example, you might consider a network with 50, 100, 150, and 200 hidden units; and dropout rates of 0.1, 0.2, … 0.5. Take all combinations of these values, and you get 20 different settings to run your network with.</p>
<p>Or you can just take 20 random samples from relevant parameter range: 50-200 for hidden units, and 0.1-0.5 for dropout.</p>
<p>Why is the second approach better? Because not all hyperparameters have the same impact. Suppose that in this case dropout doesn’t make a big difference, but the number of hidden units matters. With grid-search, you have 20 runs, but only sample 4 distinct values of the parameter that matters. With random search, you take 20 samples for the number of hidden units as well.</p>
<p>This effect gets even more pronounced in higher dimensions. The number of runs required for grid-search explodes. Why waste runtime on parameters that might not even matter?</p>
<h3 id="multiple-minima-are-not-an-issue-in-deep-learning">Multiple minima are not an issue in deep learning</h3>
<p>We often illustrate gradient descent with a 2 dimensional picture. In two dimensions it’s easy to picture a curve with multiple minima where gradient descent (or any other optimization algorithm) finds a sub-optimal local minima. This intuition doesn’t carry over to higher dimensions.</p>
<p>Two have a local minimum, your directional derivative has to be 0 in all directions. In very high dimensions, the probability that happens is fairly small. If you find a local minimum, chances are you hit the global one.</p>
<p>However, your surface will be full of (relatively) flat areas and saddle points (where your gradient is 0 in some directions, but not all). Flat surfaces and saddle points are problematic because they make your update steps tiny, and your optimization algorithm may not find a minimum in a reasonable amount of time.</p>
<h3 id="dont-use-rank-1-arrays">Don’t use rank 1 arrays</h3>
<p>If you sum an (n, n) matrix in numpy, the result is a (n,) rank 1 array. This result can lead to bugs (what if you summed over the wrong axis?), and makes broadcasting unclear. It’s better to avoid these rank 1 arrays for clarity.</p>
<p>Instead, we should sum to a (n, 1) or (1, n) vector. That makes it explicit over which dimension we are summing. We just need to use the option ‘keepdims = True’.</p>
<p>Also, we should pepper our code with assertions that check the shape of our arrays. (Putting ‘assert W.shape = (x, y)’ into our code). The computational overhead is minimal, and the assertions help us find errors quickly.</p>
<h3 id="build-something-quickly-iterate-fast">Build something quickly, iterate fast</h3>
<p>Andrew’s recurring suggestion is to quickly iterate. Instead of trying to build a hyper-complex model from scratch, build a simple one, then look at where the largest margins of improvement are.</p>
<p>To facilitate quick iteration, a single number evaluation metric is extremely helpful. It’s hard to iterate if we have multiple metrics, each preferring a different model. To deal with multiple metrics we have two strategies.</p>
<p>First, we can combine them. For example, we can turn precision and recall into an f1 (or f2) score.</p>
<p>Alternatively, we can designate satisficing metrics. Satisficing metrics only have to be over (or under) a certain threshold. For example, we might say that the memory requirement our model is a satisficing metric. We want our model to fit into memory, but after that we don’t care about its size. We would have our optimizing metric, say the f1 score, with the constraint that our model fits into memory.</p>
<h3 id="changing-proportions-of-traindevtest-sets-for-big-data">Changing proportions of train/dev/test sets for big data</h3>
<p>60/20/20 was a typical train/validation/test split proportion for machine learning in the pre-deep-learning era. However, with tens of millions of examples, this split doesn’t make sense anymore. After all, the validation set is only used for monitoring performance and optimizing hyper-parameters, and the test set only serves to calculate our error. Often, 1% of our data is enough for them.</p>
<h3 id="human-error-helping-with-biasvariance">Human error helping with bias/variance</h3>
<p>For many deep learning tasks it is helpful to know, at least approximately, what human error is. For naturally occurring tasks, such as image classification, human error is plausibly close <a href="https://en.wikipedia.org/wiki/Bayes_error_rate">Bayes optimal error</a>. Knowing human/Bayes error we know how much our model can to improve.</p>
<p>Moreover, knowing human error helps us determine if our algorithm suffers from high bias. We normally treat the training set error as a measure of our algorithm’s bias, and the difference between our trianing and validation error as the algorithms variance.</p>
<p>That’s not fair though, as some of the training set error might be unavoidable. For example, when doing image classification, some of our training examples might be mislabeled. Even a perfect model would have training set error.</p>
<p>Instead, we should focus on the avoidable bias. Knowing human error comes in handy here. If our model has a 2% training error on an image classification task, but humans also have a 2% error (maybe because of mislabeled examples), we know that reducing bias is not the best margin for improvement. However, if humans have a ~0% error, it is worth trying bias reducing techniques.</p>
<h3 id="incorrectly-labeled-training-examples">Incorrectly labeled training examples</h3>
<p>Not necessarily a problem, as long as there aren’t too many of them, and the mislabeling isn’t systematic.</p>
<h3 id="what-to-do-if-traindev-sets-have-different-distributions">What to do if train/dev sets have different distributions</h3>
<p>Validation and test sets should always have the same distributions, but sometimes our training set can have a different one.</p>
<p>For example, suppose we are building an app that classifies pictures into cats vs. non-cats. Our app will be used for images recorded on a smartphone. We have 10000 such images that we can split into a validation and test set.</p>
<p>Our training set is slightly different. We have 1 million pictures downloaded from the internet. These aren’t exactly the same as the ones recorded on the phone: they might be higher resolution, the cat is usually in the middle, etc. But we have a lot more of them.</p>
<p>We fit a model. It has 2% training error, but the error on our validation set is a whopping 8%. Does our model suffer from high variance, or is it just our training and validation sets being different?</p>
<p>To answer this question, we can create a separate ‘train-dev’ set. This is data of the same distribution as our training set that isn’t used for training.</p>
<p>In our example, we would set aside a subset of the 1 million images from the internet. We use the rest for training. Now we can compare errors.</p>
<p>If our model has a 8% error on this train-dev set, we know that it suffers from high variance. We could try variance reducing techniques, such as increasing regularization. On the other hand, if it scores 2% on the train-dev set (same as the training error), we know that the problem is the differing distributions for training and validation. We need to collect more mobile phone data so we can train our model on that.</p>
<h3 id="few-shot-learning-with-siamese-network">Few shot learning with Siamese network</h3>
<p>Siamese networks look <a href="https://www.quora.com/What-are-Siamese-neural-networks-what-applications-are-they-good-for-and-why">awesome</a>.</p>
<h3 id="word-embeddings-dont-need-a-complex-model">Word embeddings don’t need a complex model</h3>
<p>A simple model like skip-grams suffices. To learn embeddings in an efficient way (avoiding the massive computations needed for a softmax classifier) we can use algorithms such as negative sampling or Glove.</p>krisztiankovacsAndrew Ng mentioned lots of tips and tricks during his Coursera class. Below I list a miscellaneous subset. Many are not even DL specific. They are in no particular order; some people might find them obvious (especially with hindsight). I found them helpful and interesting.Review: Blitzing Through the Coursera DL Specialization2018-10-14T02:11:44+00:002018-10-14T02:11:44+00:00https://krisztiankovacs.com/review-blitzing-through-the-coursera-dl-specialization<p>I recently completed Andrew Ng’s <a href="https://www.coursera.org/specializations/deep-learning">deep learning specialization</a> on Coursera.</p>
<p><strong>TLDR;</strong> it’s a great resource. If you are familiar with ‘vanilla’ machine learning and would like to understand deep learning, it’s the perfect place to start. The course teaches both basic DL theory, and tips and tricks on how to optimize your workflow. The lectures are really easy to understand and follow (great), and the coding assignments are also extremely simple (not so great). Coursera says that the course takes between 4-5 months to complete, but if you’re dedicated it should take less than one month.</p>
<h2 id="overview">Overview</h2>
<p>The specialization consists of 5 courses.</p>
<ol>
<li><strong>Neural Networks and Deep Learning.</strong> Covers basic feedforward networks. Explains shallow & deep networks, parameter initialization, gradient descent, backpropagation, but doesn’t dwell too much on the math and proofs.</li>
<li><strong>Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization.</strong> This module explains the ideas that make NNs work well in practice. One such topic is regularization: weight decay, dropout, batch norm. Enhanced optimization algorithms are also covered: momentum, RMSprop, ADAM. Plus some miscallenous topics: hypterparameter tuning, exploding/vanishing gradients, and gradient checking, etc.</li>
<li><strong>Structuring Machine Learning Projects.</strong> A short module about overall DL project strategy and practical tips and tricks. Topics are cover different aspects of what to focus on and how analyze the errors of your algorithm.</li>
<li><strong>Convolutional Neural Networks.</strong> A big overview of computer vision. Goes over building blocks like convolutions, padding, strides, pooling; covers common CNN architectures; and also explains topics such as object-detection, Siamese networks, or neural style transfer.</li>
<li><strong>Sequence Models.</strong> Exactly what you would expect: RNNs, GRUs, LSTMs. This course also goes into the different ways to create word embeddings. At the end, it explains attention models.</li>
</ol>
<h2 id="prerequisites">Prerequisites</h2>
<p>The first course assumes no background, but to get most out of it, you should know some machine learning and linear algebra.</p>
<p>Each subsequent course builds on the knowledge of the previous, it’s best to take them as a sequence.</p>
<h2 id="lectures">Lectures</h2>
<p>I don’t have much to say about them; they are fantastic. Andrew’s explanations are simple, even for somewhat advanced concepts. He doesn’t shy away from using math, but limits it to where it makes a practical difference (we are spared the proof of backpropagation). Sometimes he discusses code, but not too often; that is left to the assignments. The lectures are made up of ~10min long videos - the perfect length.</p>
<h2 id="programming">Programming</h2>
<p>Programming assignments start with numpy in the first course; no DL frameworks just yet. I find that great: coding networks up from scratch really makes you understand the building blocks.</p>
<p>Starting course 2 you are introduced first to tensorflow, then to keras. You will still need to implement numpy code from time to time, but coding in the frameworks will take more and more of the emphasis. I think the timing of the transition is well-placed.</p>
<p>I found the assignment topics also interesting. For example, classifying cats vs non-cats (of course), transferring the impressionistic style of Monet to a picture of the Louvre, generating new dinosaur names, generating jazz music, and more.</p>
<p>I do have a beef with the assignments: they are way too easy, involving too much hand-holding. Usually, you are given an empty function that you have to fill in based on instructions and pseudo-code. Very often you only have to write a few lines of code, and much of the assignments can be completed by someone with no knowledge of DL who just reads the pseudo-code carefully.</p>
<h2 id="blitzing-through-the-course">Blitzing through the Course</h2>
<p>You can audit the course for free. Auditing means you can watch all the videos, but can’t take the quizzes and programming assignments. If you want to do those, you have to sign up. You get 7 days for free, after that you’ll have to pay.</p>
<p>I was inclined to finish the course quickly, so I decided to first audit the class and watch most of the lectures. Then I signed up, and aimed to complete quizzes plus programming in a week. It wasn’t too hard to do that, most assignments take 2 hours tops.</p>
<p>Once you complete everything you get a lovely certificate:</p>
<p><img src="/assets/img/dl_certificate.jpg" alt="png" /></p>
<p>I don’t think the certificate is worth much, but it feels good to have completed this course. As Andrew’s (now somewhat aged) machine learning course, I expect it to become the standard go-to beginner’s tutorial.</p>krisztiankovacsI recently completed Andrew Ng’s deep learning specialization on Coursera.Deep Learning Study Plan2018-09-01T02:11:44+00:002018-09-01T02:11:44+00:00https://krisztiankovacs.com/dl-journey-overview<p>I plan to spend the next 6 months, roughly 25 weeks, <a href="/giving-dl-a-shot/">studying deep learning</a>; commenting on each weeks’ achievements as I go. I’ll spend the first 10 weeks getting up to speed with the material, using various online resources (see below). I don’t have a clear plan for weeks 11-25 yet; they will mostly depend on what I find most fascinating in the first ten weeks. I expect that as I progress my focus will gradually shift from studying material to completing interesting projects.</p>
<p>For the first 10 weeks, I plan complete the following online courses:</p>
<ul>
<li><a href="http://course.fast.ai/">Fast.ai course 1 & 2</a></li>
<li><a href="https://www.coursera.org/specializations/deep-learning">Andrew Ng’s 5-part DL Course</a></li>
<li><a href="http://www.deeplearningbook.org/">Ian Goodfellow’s Deep Learning Book</a></li>
<li><a href="http://rail.eecs.berkeley.edu/deeprlcourse/">Berkeley’s Deep Reinforcement Learning</a></li>
</ul>
<p>I listened to some of the fast.ai course already, and they drop you right into the subject. I care both about fundamentals and practical applications, and usually I proceed to study in that order. Fast.ai’s top-down methodology reverses that approach, and gives you the applications first, theory later. I’ll be curious to see how their method works. They cover a lot of material, obtain stellar reviews, and have an active forum community.</p>
<p>Andrew Ng’s course and Ian Goodfellow’s book goes the other way - building up from the fundamentals. I’ll use them to fill in holes in my understanding. That will produce some redundancy as I’ll go over the same material multiple times. Not ideal efficiency-wise, but it will hopefully cement my understanding.</p>
<p>The RL course in the list as the other resources don’t treat the topic.</p>
<h2 id="timeline">Timeline</h2>
<p>My timeline is intentionally optimistic and challenging. It serves as a motivating standard to compare myself against, even if I don’t finish on time.</p>
<h3 id="week-1---review">Week 1 - Review</h3>
<p>Since I haven’t done any coding and math lately, I have to get up to speed.</p>
<p>For the math, I’ll go over part 1 of Godfellow’s book. The parts that come easy I’ll skim, the parts that I find difficult I’ll study in detail.</p>
<p>I’ll also finish Andrew Ng’s course on machine learning. It doesn’t contain much new material for me, but it serves well for a review. I’ve completed about half the course already, the rest shouldn’t take long. For assignments, I’ll use <a href="https://github.com/JWarmenhoven/Coursera-Machine-Learning">the unofficial Python notebooks</a> (there is no point in learning Octave).</p>
<p>I mostly used R in the past, so I also have to get up to speed with my Python syntax. I’ll go over <a href="https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp">Udemy’s Python for Data Science and Machine Learning</a> course, skipping most of the theory, focusing on the syntax of the algorithms I already know.</p>
<h3 id="weeks-2-3---fastai-course-1">Weeks 2-3 - Fast.ai Course 1</h3>
<p>Previously I have done some lessons of this course - but not on a serious level. I plan to start from the beginning, reproducing the notebooks discussed in the videos.</p>
<h3 id="weeks-4-5---andrew-ngs-dl-course-goodfellow-part-2">Weeks 4-5 - Andrew Ng’s DL Course, Goodfellow Part 2</h3>
<p>I don’t know much about the DL course, but based on the suggested weekly hours, I should be able to complete it in 2 weeks.</p>
<h3 id="weeks-6-10---fastai-course-2-rl-course-goodfellow-part-3">Weeks 6-10 - Fast.ai Course 2, RL Course, Goodfellow Part 3</h3>
<p>A long time (5 weeks) and a lot of material. I’ll probably post a more detailed breakdown as I approach week 6.</p>krisztiankovacsI plan to spend the next 6 months, roughly 25 weeks, studying deep learning; commenting on each weeks’ achievements as I go. I’ll spend the first 10 weeks getting up to speed with the material, using various online resources (see below). I don’t have a clear plan for weeks 11-25 yet; they will mostly depend on what I find most fascinating in the first ten weeks. I expect that as I progress my focus will gradually shift from studying material to completing interesting projects.Diving into Deep Learning2018-08-29T02:11:44+00:002018-08-29T02:11:44+00:00https://krisztiankovacs.com/giving-dl-a-shot<p>I’ve decided to give deep learning a shot.</p>
<p>There is definitely a lot of hype about the topic, and it sounds quite interesting. I’m not sure yet whether I want to treat it as a fun side-project or a potential career, but I do have a lot of free time nowadays, and learning DL seems like a good investment.</p>
<p>It shouldn’t be too hard. I have experience with “traditional” machine learning, I’m comfortable with heavy math, and I’m familiar with simple feed-forward neural nets.</p>
<p>I’ll treat it as a “flexible” 6 month project, spending 20-40 hours a week on it. I’ll post my tentative study plan in a few days.</p>krisztiankovacsI’ve decided to give deep learning a shot.