Web architecture is simply the architecture, be it hardware or software behind the World Wide Web. The internet is part of all of our lives. Its is the backbone of global communications and is accessed by about 36 Million people in the UK alone each day. The Internet, was once the domain of nerds and geeks alike, a complex matrix of blogs, databases and websites. Nowadays, since the birth of social networking such as Facebook, Twitter and Google+ the Internet has become a very intimate and interactive part of everyones lives. This section will go into all the work that goes on behind the scenes of the worlds largest network, the Internet.
To understand how the World Wide Web works, you first have to understand what the Internet really is and what the difference between them is. The internet is a colossal network infrastructure connecting networks to networks globally. This in turn connects millions upon millions of devices together every day. In effect any device can connect to any other device so long as they are both connected to the Internet, which is what you do every day when you connect to a websites webserver. The internet is so vast however that a language known as protocols are required to send data over the internet, such as HTTP for dealing with Hypertext etc.
Below is a diagram outlining the basic steps your internet browser will go through to make a request to a website for a webpage, for instance Google. The red arrows shows the path of the request from your PC to the web server for the web page. The green arrows represent the return path from the web server to your PC.
The first element required is a means of displaying the content, for this a web browser is required. There are many web browsers available for the desktop computer, with some of the most popular being Google Chrome, Mozila Firefox, Opera, Safari and unfortunately Internet Explorer. If you haven’t seen my Internet Explorer rant, click here to read it. This software is used to quite simply read HTML documents in its simplest form. It is also responsible for locating and retrieving content off of the World Wide Web, such as images, video and of course as previously mentioned HTML documents. To read more about a Web Browser see Webopedia’s article here.
For your PC to communicate with the network a NIC (Network Interface Card) is required. This gives your PC the capability to send/receive signals over a Local Area Network. These can use the Ethernet standard with RJ-45 connectors and CTA5e/6/7 cable, or the wireless Wi-Fi standard. The LAN will originate at the Router where the subnet begins and the DHCP server assigns individual internal IP’s to the individual devices on the network. Inside the LAN there may be devices such as Switches, Wireless Access Points and miles of ethernet cable. The difference between a router and a switch for the record is simply this; switches create networks and connects PC’s and other network devices together. Routers, on the other hand, connect networks. In the typical household there will be a Modem Router connecting everything together. These will also contain a switch built in, usually 4-port, and a Wireless Access point and a firewall to separate the users from the perils of the internet.
Between your Modem/Router and the Internet however, there is your ISP, or Internet Service Provider. An ISP is in its most basic form a business that, for a fee, connects its customers to the Internet. The ISP is the company which owns the network between your house and the internet. Before we get into the 4 main types of networks between your house, and the ISP, its important to identify something higher then the ISP and distinguish the difference between the PSTN (Public Switched Telephone Network) and ISDN (Integrated Services Digital Network) as both terms are often floated around when it comes to discussing ISPs and the Internet.
The PSTN is simply the network of the globes telephone networks. It is the ISP’s as a whole, along with all the telephone companies and so on. Its the network of the old fashioned telephone lines, the fiber-optic cables running under the atlantic, the transmissions going to and from communications satellites, cellular networks, and even microwave transmission links to name but a few. These are all interconnected using “switching centers”, allowing any telephone in the world to contact another. The ISDN however is a set of “communication standards for simultaneous digital transmission”. This can be transmission of voice such as telephone calls, video such as satellite streaming or, most importantly in our case, data. This is where the Internet comes in. Suffice to say that without the PSTN, there would be no internet. In more ways than one, the PSTN and the Internet are the same thing as they both exist on the same physical infrastructure. The main differences between the Internet and the PSTN however is what lies at either end of the network.
The PSTN contains telephones, fax machines and dial up. These are all systems which require nothing more than the “Plain Old Telephone Service” . With these devices, when a call is made between two subscribers the circuit will be kept open for the duration of the “call”, including silence. This is a massively inefficient use of bandwidth and is a reminder of the old analogue past.
The Internet on the other hand revolves around a system of clients, nodes and servers, with user interfaces and an input/output device for information. The internet breaks down a transmission between clients and servers into millions of packets of data, whose routing information in the head of the packet, guides it to its destination. Unlike the PSTN the packets are separate from each other and will not have physical hard link from where it originated to its location, meaning it can take various routes, but will be reassembled on the other side in order like it should. This means it uses far less bandwidth than its PSTN counterparts making a cheaper method of transmission and more efficient too. Nodes are essentially PSTN switching centers, using Internet terminology and are used to connect networks.
As far as “the last mile” is concerned, the last mile between your house and the ISP, at the moment there are 4 main hardline types of networking in place by ISP’s around the world. The first is dial-up. Dialup is an old and obsolete method of sending and retrieving data to and from the internet over 56K lines, 56Kb/s being the maximum theoretical transceiving rate of data. Next there is ADSL or Asymmetric Digital Subscriber Line. This uses the pre-existing copper telephone network, similar to dial up, but utilises frequencies not used by telephone calls meaning its possible to continue using the broadband, even if someone is receiving/making a telephone call on the same line. ADSL speeds greatly differ from each other, as it depends upon the distance between the property and the telephone exchange, but suffice to say speeds can vary greatly from about 24Mb/s download to near dial-up speeds. ADSL has a much higher bandwidth downstream (server to premises) when compared to upstream (premises to server), meaning that a typical connection of 10Mb/s down, may only get up to 2Mbps up (hence being asymmetric).
Fiber Optic broadband has two main categories. FTTC (Fiber To The Cabinet) and FTTP (Fiber To The Premises). FTTC is where fiber optic cables are laid to the local street cabinet, increasing the speeds from ADSL flakey 24Mb/s to more often than not about 64Mb/s down, and speeds of upto 20Mb/s up. As you can see FTTC is still asymmetric as it is limited by its copper section between the cabinet and the property.
FTTP is more interesting, as it is also a means of receiving cable television in some areas. Cable television is traditionally sent through coaxial cables and has an asymmetric transceiving speeds similar to ADSL. FTTP however is a pure fiber solution, and because there are many colours of light, many different signals can be sent down one individual fiber line simultaneously. This means that a pure FTTP solution should be totally symmetric, meaning that the uploads and downloads should be the same. The speeds are only limited by the size of the cable, and the amount the ISP throttles the user, for instance in Kansas City, Kansas, Google Fiber supports speeds of up to 1Gb/s upstream and downstream.
On the web server's side of the Internet, where the website is hosted there are 3 main elements; DNS (Domain Name System) servers, the servers ISP and the web server itself. Firstly the Domain Name System is not one individual server but an entire server, and it isn’t fair to put it between the Internet and the ISP as done in figure 1, however I believe for demonstrating how the internet works I think thats where its best suited for the sake of simplicity. As the Domain Name System is so huge, I decided to make a further diagram to explain it in more detail. The Domain Name System is essentially the yellow pages of the internet which is a distributed system of servers which caches the domains, and the IP’s they map to in the WhoIs database. This diagram goes through how your web browser resolves the IP address of a Domain, for instance http://webtech.mavieson.co.uk.
- Your computer will make a request to the nearest geographical DNS server for directions to http://webtech.mavieson.co.uk, using the DNS protocol.
- If it has the domain mapped to an IP address, it will skip steps 3 and 4 and simply return the IP to the computer. If not then the DNS server will ask another DNS server for directions.
- If the DNS server has the IP in it’s cache then it will send the IP back to the previous server. If not it will continue asking DNS servers until it finds the IP.
- The DNS server will store the IP in its cache, in case more requests are made, and send the IP back to the computer.
- The computer now has the directions, or the IP to http://webtech.mavieson.co.uk and will connect to the webserver to get the information.
The domain name is comprised of three or four elements, depending on the domain. For instance http://webtech.mavieson.co.uk has 4 elements. First there is the protocol, http. The Hypertext Transfer Protocol is the protocol used to connect to the domain, and retrieve data. Next is webtech which is a subdomain, of mavieson which is the main domain. The way I have http://mavieson.co.uk setup at the moment is the main website is on the primary domain, http://mavieson.co.uk and the subdomains points to other webpages, such as http://mail.mavieson.co.uk which redirects to the email service, provided by google apps, and ofcourse http://webtech.mavieson.co.uk where this website is hosted. Finally there is the TLD, or Top Level Domain. This in my case is .co.uk. However other common TLDs include .com, .net, .biz, .org and so on. For a full list see ICANNs list here.
The final step to make a request for a request in Figure 1 for a web page, once the IP has been resolved, is the webserver itself. A web server is essentially a computer with web server software on it, such as apache. These are used primarily for hosting websites, but there are other uses such as mail servers, data storage or even gaming. When it receives a request for content, it will send data back over the internet to the client PC, retracing its steps, but bypassing the Domain Name System as the server already knows the IP of the client, as it is in the head of the packet(s) sent to the web server in the clients request. And in addition to that the clients IP likely won't have a domain associated with it, therefore not in any public records.
Web 2.0 is literally the second generation of the Internet. To understand Web 2.0 we first have to go over what Web 1.0 is. Web 1.0 is the traditional sense of the world wide web, a top-down approach, with web servers being at the top with most of the data flowing downstream to the clients, which is why ADSL worked so well for clients. Web 2.0 is where collaboration and sharing came to the Internet. Blogs, wikis, social networks (Facebook, Google+, Twitter etc), web applications, and cloud based storage are all elements of Web 2.0 where the average user can give back. At the bottom of the home page, there is a Disqus comments panel. This allows the user, you, to post comments on this article if you so wish. At this very moment I am using a web based application to write this, Google Docs. The brilliant thing about using web based applications is that sharing and collaboration on work is easy to do allowing for faster publication time. Another good feature is all your work is safe. On google docs all files are encrypted, and to login you must use a complex password (including upper/lower cases numbers and symbols) and two step authentication to my mobile phone.
To host a website or blog or wiki etc, there are two main requirements; web hosting and a domain. There are three main types of web hosting; Shared Hosting, Virtual Private Server and Dedicated Server. Shared hosting is the cheapest with companies such as bluehost offering hosting from as little as $3.00 a month. As it's the cheapest, one would not expect miracles, however for an information website, such as a website for a café or a solicitors firm which requires a website for reference only. Many cheap web hosting providers also offer SSL certificates allowing for the traffic between the server and you clients pc to be secured using the HTTPS protocol.
Virtual private servers are quite basically a virtual machine on a server. These allow more bandwidth and processing power for larger websites such as wikis, community websites such as arstechnica and so on. This is a more expensive method of hosting for websites, however it is still cheaper than an entire dedicated server. Sites such as Google, Facebook, Twitter etc all have a multitude of dedicated servers in their server farms, however scale back to an individual, rack space in data centres can be rented for individual use where users can have their own dedicated server for hosting websites, or rent it out to other people for hosting.
A domain name as previously mentioned is simply a mask for an IP, the IP of where the website is hosted, however in order to buy a web domain you will need to buy one from a domain name registrar. Many hosting providers such as bluehost, godaddy etc offer a free domain with the hosting, making them the registrar, but standalone registrars such as domaindiscount24 and 123-reg are available, and often offer domains at lower prices. Many large companies will buy a domain, say www.google.com, but they will also buy the domain with many other TLDs help customers from being misguided, which is why google owns www.google.co.uk, www.google.net etc. It also prevents copyright theft, and protect company image in case someone was to buy www.google.net and place a landing page there full of viruses.