How does web browser work
For example, many websites actually use images as links, so you can just click the image to navigate to another page. Review our lesson on Understanding Hyperlinks to learn more. The Back and Forward buttons allow you to move through websites you've recently viewed. You can also click and hold either button to see your recent history. The Refresh button will reload the current page. If a website stops working, try using the Refresh button. Many browsers allow you to open links in a new tab.
You can open as many links as you want, and they'll stay in the same browser window instead of cluttering your screen with multiple windows. To open a link in a new tab, right-click the link and select Open link in new tab the exact wording may vary from browser to browser. If you find a website you want to view later, it can be hard to memorize the exact web address. Bookmarks , also known as favorites , are a great way to save and organize specific websites so you can revisit them again and again.
A web browser is not the same as a search engine, despite the fact that the two are often confused. A search engine is simply a website that provides links to other websites to a user. Web browser function starts with a user entering the desired URL Uniform Resource Locator into the address bar of the browser. The URL prefix provides the protocol used to access the location.
Once the resource has been located and interpreted the browser will display the content to the user. Browsers can interpret and display content such as video, images, text, hyperlinks, and XML files. The features of available web browsers range from minimal, text-based user interfaces with plain HTML support to rich user interfaces that support a wide range of file formats and protocols.
All major web browsers allow users to access multiple websites at the same time, either in separate browser windows or in different tabs of the same window.
Most web browsers can display a list of bookmarked web pages so that the user can quickly return to them. Furthermore, most browsers can be extended with plug-ins, which are downloadable components that add new functionalities. Web browsers are now widely available and can be used on a variety of devices, including computers, laptops, and mobile phones, but the process of making browsers more affordable took many years. All web browsers perform the same functions. As a result, in addition to the various types, there are various web browsers that have been used over time.
The very first web browser, In , the company was founded. Had very simple features and a graphical interface that was less interactive. The bookmark feature was not available. It was first introduced in and was the second web browser to be released.
It had a more appealing graphical user interface. The use of images, text, and graphics could all be combined. Marc Andreessen was the man in charge of the Mosaic development team. It came out in In terms of usage share in the s, it was the most popular browser. Netscape released new versions of this browser. It's important to understand that this is a gradual process.
For better user experience, the rendering engine will try to display contents on the screen as soon as possible. It will not wait until all HTML is parsed before starting to build and layout the render tree. Parts of the content will be parsed and displayed, while the process continues with the rest of the contents that keeps coming from the network. From figures 3 and 4 you can see that although WebKit and Gecko use slightly different terminology, the flow is basically the same.
Gecko calls the tree of visually formatted elements a "Frame tree". Each element is a frame. WebKit uses the term "layout" for the placing of elements, while Gecko calls it "Reflow". It is called the "content sink" and is a factory for making DOM elements. We will talk about each part of the flow: Parsing—general Since parsing is a very significant process within the rendering engine, we will go into it a little more deeply. Let's begin with a little introduction about parsing.
Parsing a document means translating it to a structure the code can use. The result of parsing is usually a tree of nodes that represent the structure of the document. This is called a parse tree or a syntax tree. Parsing is based on the syntax rules the document obeys: the language or format it was written in. Every format you can parse must have deterministic grammar consisting of vocabulary and syntax rules. It is called a context free grammar. Human languages are not such languages and therefore cannot be parsed with conventional parsing techniques.
Lexical analysis is the process of breaking the input into tokens. Tokens are the language vocabulary: the collection of valid building blocks. In human language it will consist of all the words that appear in the dictionary for that language. Parsers usually divide the work between two components: the lexer sometimes called tokenizer that is responsible for breaking the input into valid tokens, and the parser that is responsible for constructing the parse tree by analyzing the document structure according to the language syntax rules.
The lexer knows how to strip irrelevant characters like white spaces and line breaks. The parsing process is iterative. The parser will usually ask the lexer for a new token and try to match the token with one of the syntax rules. If a rule is matched, a node corresponding to the token will be added to the parse tree and the parser will ask for another token. If no rule matches, the parser will store the token internally, and keep asking for tokens until a rule matching all the internally stored tokens is found.
If no rule is found then the parser will raise an exception. This means the document was not valid and contained syntax errors. In many cases the parse tree is not the final product.
Parsing is often used in translation: transforming the input document to another format. An example is compilation. The compiler that compiles source code into machine code first parses it into a parse tree and then translates the tree into a machine code document. In figure 5 we built a parse tree from a mathematical expression. Let's try to define a simple mathematical language and see the parse process.
Vocabulary: Our language can include integers, plus signs and minus signs. Syntax: The language syntax building blocks are expressions, terms and operations. Our language can include any number of expressions. An expression is defined as a "term" followed by an "operation" followed by another term An operation is a plus token or a minus token A term is an integer token or an expression.
The first substring that matches a rule is 2 : according to rule 5 it is a term. The next match will only be hit at the end of the input. Vocabulary is usually expressed by regular expressions. Syntax is usually defined in a format called BNF. We said that a language can be parsed by regular parsers if its grammar is a context free grammar. An intuitive definition of a context free grammar is a grammar that can be entirely expressed in BNF.
For a formal definition see Wikipedia's article on Context-free grammar Types of parsers There are two types of parsers: top down parsers and bottom up parsers. An intuitive explanation is that top down parsers examine the high level structure of the syntax and try to find a rule match. Bottom up parsers start with the input and gradually transform it into the syntax rules, starting from the low level rules until high level rules are met.
The bottom up parser will scan the input until a rule is matched. It will then replace the matching input with the rule. This will go on until the end of the input. The partly matched expression is placed on the parser's stack. There are tools that can generate a parser. You feed them the grammar of your language—its vocabulary and syntax rules—and they generate a working parser.
Creating a parser requires a deep understanding of parsing and it's not easy to create an optimized parser by hand, so parser generators can be very useful. WebKit uses two well known parser generators: Flex for creating a lexer and Bison for creating a parser you might run into them with the names Lex and Yacc.
Flex input is a file containing regular expression definitions of the tokens. Bison's input is the language syntax rules in BNF format.
As we have seen in the parsing introduction, grammar syntax can be defined formally using formats like BNF. HTML cannot easily be defined by a context free grammar that parsers need. There are lots of available XML parsers. The difference is that the HTML approach is more "forgiving": it lets you omit certain tags which are then added implicitly , or sometimes omit start or end tags, and so on.
On the whole it's a "soft" syntax, as opposed to XML's stiff and demanding syntax. This seemingly small detail makes a world of a difference. On one hand this is the main reason why HTML is so popular: it forgives your mistakes and makes life easy for the web author.
On the other hand, it makes it difficult to write a formal grammar. So to summarize, HTML cannot be parsed easily by conventional parsers, since its grammar is not context free. This format is used to define languages of the SGML family. The format contains definitions for all allowed elements, their attributes and hierarchy.
There are a few variations of the DTD. The strict mode conforms solely to the specifications but other modes contain support for markup used by browsers in the past. The purpose is backwards compatibility with older content.
The current strict DTD is here: www. The output tree the "parse tree" is a tree of DOM element and attribute nodes. The root of the tree is the " Document " object.
The DOM has an almost one-to-one relation to the markup. See www. It is a generic specification for manipulating documents. A specific module describes HTML specific elements. The HTML definitions can be found here: www. Browsers use concrete implementations that have other attributes used by the browser internally.
As we saw in the previous sections, HTML cannot be parsed using the regular top down or bottom up parsers. The reasons are: The forgiving nature of the language. The fact that browsers have traditional error tolerance to support well known cases of invalid HTML. The parsing process is reentrant.
For other languages, the source doesn't change during parsing, but in HTML, dynamic code such as script elements containing document. Unable to use the regular parsing techniques, browsers create custom parsers for parsing HTML. The parsing algorithm is described in detail by the HTML5 specification. The algorithm consists of two stages: tokenization and tree construction. Tokenization is the lexical analysis, parsing the input into tokens.
Among HTML tokens are start tags, end tags, attribute names and attribute values. The tokenizer recognizes the token, gives it to the tree constructor, and consumes the next character for recognizing the next token, and so on until the end of the input.
The algorithm's output is an HTML token. The algorithm is expressed as a state machine. Each state consumes one or more characters of the input stream and updates the next state according to those characters.
The decision is influenced by the current tokenization state and by the tree construction state. This means the same consumed character will yield different results for the correct next state, depending on the current state. The algorithm is too complex to describe fully, so let's see a simple example that will help us understand the principle. The initial state is the "Data state". Consuming an a-z character causes creation of a "Start tag token", the state is changed to "Tag name state".
Each character is appended to the new token name. In our case the created token is an html token. So far the html and body tags were emitted. We are now back at the "Data state". We will emit a character token for each character of Hello world. We are now back at the "Tag open state". Then the new tag token will be emitted and we go back to the "Data state". When the parser is created the Document object is created.
During the tree construction stage the DOM tree with the Document in its root will be modified and elements will be added to it. Each node emitted by the tokenizer will be processed by the tree constructor.
For each token the specification defines which DOM element is relevant to it and will be created for this token. The element is added to the DOM tree, and also the stack of open elements. This stack is used to correct nesting mismatches and unclosed tags. The algorithm is also described as a state machine. The states are called "insertion modes". The input to the tree construction stage is a sequence of tokens from the tokenization stage.
The first mode is the "initial mode". Receiving the "html" token will cause a move to the "before html" mode and a reprocessing of the token in that mode. The state will be changed to "before head".
The "body" token is then received. The rendering engine can also work with other types of data with the help of certain plugins and extensions. Below are the rendering engines used by major web browsers:. The Networking component handles internet communication and security. The JavaScript Interpreter, as the name suggests, interprets and executes the JavaScript code embedded in a website.
The results then are sent to the rendering engine for display. UI Backend helps to draw basic widgets like a select box, an input box, window, a check box, etc. It uses the underlying operating system user interface methods for the same. Whenever you click on a link or enter a URL, the browser sends and receives information or data to and from other parts of the web. The information it receives is rendered by the rendering engine and translated into an easily understandable format.
It is then displayed in the user interface. It involves a multi-step process including DNS resolution , HTTP exchange between browser and web server, rendering , and so on, as follows:. It was first released in
0コメント