Friday, April 4, 2014

What is Node.js?

Saibaba  Vinayakaya namah!

Okay, big question!

Node.js popularly known as Node, is a software system, whose main goal is to be able to easily build highly scalable, performant network applications.

It's an open source project sponsored by Joyent.

All programs on the node platform are written in JavaScript, making it super exciting for the web community, since most of them have dabbled in JavaScript on the client side.

We could create web applications, network applications, chat servers, and much more!

This system had been built by Ryan Dahl, and he introduced it at the European JavaScript Conference, 2009. This conference also had tremendous speakers such as Douglas Crockford, John Resig speaking there.

Anyways, Ryan introduced the system to the enthralled audience, talking about Node, and it's philosophy and demonstrated an IRC chat server. The slides for the talk can be obtained here. This is a must go-through bunch of slides, since it explains the philosophy of Node.js.

Also, watch his talk at the conference!




So, what's so special about Node? 

Why are all these people using Node?

So, Ryan Dahl, back in 2008, was searching for another platform that could make it easy for him to effective push data to the user instead of having constantly to poll, something like GMail. And he had no suitable options.

He had known how to do that with non-blocking sockets in C, make a system have a lot of open & idle network connections, which was essential for such push technology. All the existing platforms were very tightly coupled to the idea of a server accepting requests, processing the request and sending back a response eventually. But he did not want to work in C. He was sure that as long as everything was non-blocking, it was not difficult to implement non-blocking sockets.

In late 2008, Google came out with the V8 JavaScript runtime. And he had the idea of integrating his non-blocking C code with V8 and running JavaScript code on top of it. Since, there were no socket libraries for JavaScript, he was very excited. And so began the Node.js project, with Ryan quitting his job and working full time on the project, and releasing to the community! You can read about his story in his own words in the Foreword for the book, Up & Running Node.js.

Node's philosophy

 "We are doing and thinking about I/O incorrectly. It has to be done differently!" ~ Ryan Dahl

Node has a completely different philosophy of developing applications that are I/O or Network intensive.
It promotes non-blocking code in favor of blocking code.

Well, all that history is not so much fun without first understanding what non-blocking code means.
To understand non-blocking code, first we have to understand blocking code.

Well, if you had a chance to look at Ryan's presentation,

an example of a blocking code would be:

var results = db.query("SELECT * FROM tEmployee");

So, as developers we can assume what's happening. There's a function called query on the db object. It's accepting a sql statement. It goes and executes the function. And then it returns some results and we store that in the results variable. Easy Peasy!

Well, if we had 2 lines of code such as

var results = db.query("SELECT * FROM tEmployee");
doSomethingElse();

What would happen. When would doSomethingElse() function call execute?

Well, not untill the previous function call has returned. We are basically waiting on the query function to return and then we can go do the next line.

This is how most code looks like. And we are accustomed to seeing that.
All tradional systems have code like this, where we "wait" upon some I/O or network access.

Now, let's see something else.

db.query("SELECT * FROM tEmployee", function handleResults(err, data){
   // do something with the data!
});
doSomethingElse();

Basically, this is how most node code looks like.

What happens in node is, ya the query function will take some time.. i know.. but i don't want to wait for it.
I just want something to be done when it finishes. Just take this function called handleResults and execute it whenever the query function is done and ready with results.

And by the way, please don't wait go ahead and do something else.

The basic difference between the code samples is when does the next line of code(the call to doSomethingElse()) execute?

In the blocking approach, we are forced to wait for execution of query to complete, thus blocking us from doing anything else.

Instead, if we just follow node's non-blocking approach, we can perform other operations while some things are going on.

This is called Asynchronous, non-blocking evented style of programming!

Oh my god! 3 bullet points in one.
  • Asynchronous
  • Non-blocking.
  • Event-ed.

Well, what's asynchronous programming. If one task doesn't depend on waiting for another task to finish execution, and can proceed executing in parallel, such code would be called asynchronous. So we have the essence of multiple things executing side by side parallely. Our non-blocking code example is basically asynchronous, since we are saying: hey, don't wait. Do something else. Essentially saying we can perform multiple actions at the same time.

We have seen what non-blocking & asynchronous code means, now let's turn our attention to event based programming.

What we basically did in the second example is registered an event-handler function called handleResults to an event, the event being successful completion of the query execution. When query execution finishes, we want the handleResults to execute.

So Node is an event based system, where events and event handlers are very important and most code is event based.

It actually waits for events to happen and when an event happens it checks for any registered event handler and then executes it.

Basically node does this by sitting in a while loop and looking a queue where events are thrown in.

It constantly queries the queue asking, are there any new events available?

And when an event is available, it tries to handle it by looking at another internal structure and finding registered event handlers. All events handlers are then executed one by one.

This process of continuous look up for any new events and handling them is called the event-loop.

Turns out there are other platforms, such as  which have event loops and they are all based off the Reactor pattern.

If you want to understand node's execution model, then it would be worthwhile to read this paper on the reactor pattern.

To summarize,

Why Node uses the event-loop approach than a traditional thread based approach?

Ryan in his early days of web programming was confused as to why the servers processing requests were so slow!

To increase performance of a server, serving a lot of concurrent requests, the system has to be designed carefully.

Traditional web servers use one thread per connection, thus enabling concurrent execution of requests.

With such an approach, there would be let's say 500 threads running at the same time for 500 concurrent requests (requests coming in at the same time).

With a lot of threads, there is parallel execution, but there is an overhead of concurrency and process resources allocated in memory.

These servers degrade in performance as the number of requests increase and memory usage suddenly becomes high!

A comparison between 2 web servers, the Apache web server and nginx is provided by Ryan in his presentation and is probably a very core observation that one has to make.

nginx performs better than Apache, for a large number of requests. And the memory usage of Apache explodes with the number of requests being handled concurrently while nginx stays the same.

This is because of  a key architectural difference.

nginx uses an event-loop for processing concurrent requests, while Apache spawns a new thread per request.

So node prefers the event-loop architectural style and waits for events to happen in a single thread of execution.

Whenever a  new request comes in, node invokes the event handler and goes back to listening for more requests.

So why everyone else is not using the event-loop approach?

According to Ryan, the reasons are two fold.

Cultural
First of all we are always taught to write blocking code, right from day one. We are made to wait for input.

Infrastructural
Secondly, although highly efficient event loop implementations exist for Ruby (EventMachine) and Python (Twisted), all the language libraries are not all non-blocking. Some are non-blocking and only those can be used with an event loop. So developers are confused as to how they can use the other libraries that they are familiar with!

Why was JavaScript chosen as the language to code Node applications?

So, as mentioned earlier, in 2008, Ryan was experimenting with C sockets and he could do non-blocking sockets in C. But he did not want to do C. In fact, he did not want other programmers to do stuff in C.

And he experimented with Lua, Haskell etc.

In late 2008, when Google released the V8 JavaScript engine, then Ryan thought: Why not JavaScript?

JavaScript is already event based in the browser. Functions are very powerful objects. The concept of closures retaining state made JavaScript a viable option for the event-loop architecture.

Events and event-handlers are nothing new to people who have programmed the web on the client side for a long time.

The paradigm of event callbacks are built into JavaScript on the browser side and every serious JavaScript developer understands it well.

For ex: we see code like this way too often!

$('button').on("click", function handleButtonClick(){
  // Do something
});

Here we are registering an event handler on the mouse click event for all button objects on the page.

Whenever a button is clicked, the event is triggered, and the function associated with that event runs.

What's the reason for Node's Success?

JavaScript has been around since 1995, when Brendan Eich first developed it. It's the most ubiquitous language sitting in every browser on each computer.

Crockford's teachings, John Resig's jQuery and the myriad client side libraries popping out in this era 2006-2011, and the joy of JavaScript development was known to the world.

Node was an instant hit with the developer's community, because it built on top of a language that most people were familiar with

And then Node's module system, which are libraries is a huge ecosystem with many developers
contributing to Node's success.

Node's Architecture

Node has these core components. It's basically, a wrapper around these components.



Another talk that Ryan has given at Yahoo, introducing Node & talking about it's architecture!



Node is built on top of the V8 JavaScript runtime (the same one that runs inside the Chrome web browser). The runtime allows us to execute JavaScript code inside the node process.

It provides a set of non-blocking libraries for users to interact with the filesystem, http and tcp/ip.

It uses the event loop implemented in C called the 'libev' and the thread pool is another c implementation called libeio.

Node integrates all these systems together to build applications in JavaScript, execute them on V8, in an event driven style using the libev event loop.

Node also uses the CommonJS module system. Developers can create and publish modules to a public repository called npm registry.

Why is Node fast?

Node is super fast. It's meant to be fast, in terms of I/O, not mathematical computations or graphics etc.

It's fast because of
  • The event-loop implementation, which is written in C
  • Node interacts directly with the system layer
  • The idea of non-blocking code (all code in node is meant to be asynchronous and non blocking!)
  •  V8 runtime to execute the code you write for Node

So it becomes an easy choice for applications such as web applications, web servers, network servers, etc..

Resources

We have talked about a lot of things in here. Some pointers to different things that we have discussed.







No comments:

Post a Comment