How HTTPS provides connection security: what got to know each Web-developer.
So, how actually HTTPS works? Being a Web Developer, I knew that using HTTPS to protect user data – this is a very good idea, but… I have never had a clear understanding of how HTTPS actually works.
How the data is protected? How the client and server can a secure connection if someone already is bugging this channel? What is a security certificate and why do I need someone to pay to get it?
Pipeline
Before we plunge into its works mechanism, let’s briefly talk about why it is so important to protect the Internet connection and what protects HTTPS.
When browser performs a request to your favorite website, this request must pass through the variety of different networks, any of which can potentially use for bugging or for interference into established connection.
From your own computer to other computers within your local network, through routers and switches, through your provider (ISP) and through many other intermediate providers – a huge number of organizations retransmit your data. If the attacker would be at least at one of them — he has the ability to intercept transmitted data.
Generally, these requests transmitted via HTTP in which a request from the client and from server transmits in an open source. There are a lot of strong arguments why HTTP doesn’t use encryption enabled by default:
? This requires more processing powers;
? More data is transmitted;
? Cannot use a caching.
But in some cases, when very important data is transmitting through the channel (for instance, password or credit cards data), it’s necessary to ensure additional measures for prevention against such bugging activities.
Transport Layer Security (TLS)
Now we’re going to plunge into the world of cryptography, but we don’t need any special experience — we will consider only the most general questions. So, cryptography allows protecting the compound from potential hackers who want to affect the connection or just to bug it.
TLS — successor of SSL – is the protocol that most commonly used for establishing of secure HTTP connections (called HTTPS). TLS is located on the lower level of HTTP protocol in the OSI model. Explaining “at the fingers”, this means that in the query execution process first occur all of the “things” associated with the TLS connection and then everything related to the HTTP connection.
TLS – is the hybrid cryptographic system. It means that it uses several cryptographic approaches which we will consider further:
1) Asymmetric encryption (cryptosystem with the public key) for generation of general secret key and authentication (i.e. for proving your person).
2) Symmetric encryption, using a secret key for further encryption of requests and responses.
Cryptosystem with public key
Cryptosystem with public key – it’s a type of cryptographic system that uses pairs of key: the public key and private key mathematically associated with themselves. A public key is using for encryption of text message into “gibberish” whereas private key is using for decryption and getting of the source text.
Since the message was encrypted with the public key, it can only be decrypted by its corresponding private key. None of the keys may not perform both functions. The public key is published in open access without risk to expose your system to threats, but the private key should not be exposed to anyone who is not entitled to decrypt the data. So we have the keys – a public and a private key. One of the most impressing benefits of asymmetric encryption is that two sides who previously didn’t know each other can establish a secure connection, initially sharing data via an open, unsecured connection. The client and server use their own private keys (each its own) and published public key to generate a shared secret key for the session.
This means that if someone is between the client and server and monitors the connection – he would not be able to recognize the client’s private key or the server’s private key or the secret session key.
So, how it’s possible? Mathematics!
Diffie-Hellman key exchange
One of the most popular approaches is Diffie-Hellman key exchange algorithm (D-H). This algorithm allows client and server set up a general private key without sending the secret key by the connection. Thus, hackers who are bugging this channel cannot determine the secret key even if they intercept all packets without any exceptions.
As soon as key exchange has happened via DH algorithm, the obtained secret key can be used to encrypt further connection in a given session, using a much simpler symmetric encryption.
Some math…
The math functions underlying this algorithm have an important distinction – they relatively easy calculated towards, but practically not evaluated otherwise. This is the area where are involved very large prime numbers.
Let’s Alice and Bob – are two parties which carry out a key exchange via D-H algorithm. Firstly they shall agree on some basis root (usually a small number, such as 2.3 or 5) and about some large number prime (more than 300 digits). Both values sent publicly via the communication channel, without the threat of compromising the connection.
Let’s remember that Alice and Bob both have personal private keys (consists from over 100 digits), which are never sent via a communication channels.
Via communication channels can be sent only mixture given from private keys and from values prime and root.
Thus:
Alice’s mixture = (root ^ Alice’s Secret) % prime
Bob’s mixture = (root ^ Bob’s Secret) % prime
where % — remainder from division.
Therefore, Alice creates own mixture based on proven values of constants (prime and root). Bob does the same. As soon as they receive mixture values from each other, they perform some additional math operations to get private session key.
Namely:
Alice’s calculations:
(Bob’s mixture ^ Alice’s Secret) % prime
Bob’s calculations:
(Alice’s mixture ^ Bob’s Secret) % prime
The end result of this operations is the same number for both Alice and Bob, and this number has become a private key for the current session. Pay your attention that none of the parties shouldn’t have to send own private key via the communication channel, and the received secret key is also can’t be transmitted via the open connection. Great!
For those who versed less in math, Wikipedia gives a great picture which explains the current process in the case based on color mixing:
Please, notice how the initial color (yellow) eventually turns into the same “mixed” color of both Bob and Alice. The only thing that is transmitted via an open communication channel is half-mixed colors, in fact, meaningless to anyone who is bugging this channel.
Symmetric encryption
Key exchange may happen only once per session, during the establishment of the connection. When the parties have already agreed with the secret key, the client-server communication performs using symmetric encryption. Which is much more efficient for transferring information because it doesn’t require additional costs for verification.
Using secret key, which previously received, also agreed about the type of encryption, the client and server can perform the process of exchange data on the secure level, encrypt or decrypt messages received from each other with using of secret key. A perpetrator who is joined into a channel would see only “trash”, walking over the network back and forth.
Authentication
The Diffie-Hellman algorithm enables two parties to get a private secret key. But how these parties can be assured that they are actually interacting with each other? Let’s say about authentication.
What if I call my buddy, we’ll do the DH-key exchange, but it turns out that my call was intercepted and I actually communicated with someone else?! I will still be able to securely communicate with that person – no one else will be able to bugging us – but it’s not who I thought I communicate. It’s too unsafe!
For solving this issue we need an infrastructure of public keys. Which allows being certain that subjects are what they seem. This infrastructure created for creation, distribution, and withdrawal of digital certificates. Certificates – are these annoying things we should pay if we want our site works with HTTPS.
But, really, what is this certificate and how it gives us a security?
Certificates (HTTPS Certificate)
A digital certificate is a file that uses electronic digital signature (more about that in a minute) and connecting a public key of the computer with its identity. The digital signature on the certificate means that someone certifies the fact that this public key belongs to a particular person or organization.
In fact, certificates associate domain names with a specific public key. This prevents the possibility that an attacker will provide his public key for impersonating himself as a server, accessed by the client.
In the example with the phone, shown above, a hacker can try to show me your public key, posing as my friend – but the signature on the certificate will not be owned by someone I trust.
To make the certificate trusted by any web browser, it must be signed by an accredited certification center (Certificate Authority, CA). CA is a company that performs a manual verification that the person attempting to obtain a certificate fulfills the following two conditions:
1. Actually, exists.
2. Has an access to a domain, the certificate to which he’s trying to obtain.
Once the CA is satisfied that the applicant is real and he really controls the domain, the CA signs the certificate for this site, in fact, installing a stamped confirmation on the fact that the public key of the site really belongs to him and he can be trusted.
Your browser has a preinstalled list of accredited CA. If server returns certificate (which was unsigned by accredited CA), that will appear a big red warning. In the other case, each of us can be able to sign fake certificates.
So even if the hacker took the public key from his server and generated the digital certificate of this public key, associated with the site facebook.com, browser will not believe in it, because the certificate is not signed by an accredited CA.