Logistic Regression is a classic supervised learning algorithm used for binary classification, or classifying data between two and only two categories. Think decisions like “spam or not not”, “fraud or legit legitimate”, “cat or dog”.
Despite the name, logistic regression isn't regression in the typical sense, like linear regression. Instead of predicting a continuous value, it estimates the probability that an input belongs to a particular class. To do this, it first computes a weighted sum of the input features, then passes that result through a function called the sigmoid, which squashes the output to a value between 0 and 1. If the result is closer to 1, we interpret it as 'yes'; if it's closer to 0, 'no'.
How does logistic regression work?
- It takes the input features, let's say pixel intensity, age, temperature, or anything numeric.
- It computes a weighted sum of these inputs.
- That sum is then passed through a sigmoid function:
- The result is a probability. You choose a threshold (like 0.5) to decide if it is a 1 or a 0.
Why the sigmoid function?
The Sigmoid function transforms any real valued number into something between 0 and 1. This makes it perfect for binary classification problems, where we want to say either “yes” or “no”. based on probability. Plus, its curve has a nice property, as it smoothly transitions between extremes, making it easy to apply gradient optimization.
The perceptron trick connection
Logistic Regression is often seen as a smooth upgrade to the
classic perceptron algorithm, which just says “yes” or “no” based on a hard cutoff. Logistic regression applies the sigmoid function to produce a soft decision boundary, yielding probabilistic outputs rather than hard class assignments. This is an advantage when dealing with non-linearly separable data. You still update weights like in the perceptron, but now you are optimizing a smoother, more forgiving error landscape.
What about loss? Enter Maximum Likelihood
Instead of using squared error like in linear regression, logistic regression uses log loss, derived from maximum likelihood estimation (MLE). In short:
- You want your model to assign high probability to the correct labels.
- The log loss punishes confident wrong predictions more heavily than uncertain ones.
MLE turns this into a game of maximizing the chance that your model’s predicted probabilities match the true labels. It’s the probabilistic backbone of logistic regression.
How to use logistic regression with FiftyOne
One way to turn logistic regression into a computer vision-ready classifier is to train it on fixed-length feature vectors, most commonly embeddings you extract from a pretrained neural network.
FiftyOne makes this workflow easy: you pull in a dataset, compute embeddings, and then feed those vectors to a logistic regression model. Once the model has learned the mapping, you can write the predictions back to the dataset as classification objects and use FiftyOne’s built-in evaluation and visualization to inspect mistakes, confidence histograms, and per-sample accuracy. Because the logistic layer is linear, training is fast even for tens of thousands of images.
The following code shows an example of this workflow.
Summary of why logistic regression is popular
- It’s fast and interpretable.
- It works well when your classes are linearly separable.
- It’s great for baselines before you bring out the deep learning flamethrower.
- It plays nicely with small and medium datasets.
Logistic regression is your go-to when you want a fast, reliable model to answer binary questions. It uses the sigmoid to make probabilistic decisions, MLE to fine-tune its confidence, and a vibe borrowed from the perceptron but with better math.