"Let it crash ..."


Arie van Wingerden <xapwing@...>
 

No flamewars please!!!
Just an honest question!

In Erlang there is the "let it crash" philosophy.
So far, what I get from Pony, it will try to avoid a crash at all cost.

In Erlang (originally the thelephony world) it seems to be an appropriate philosophy, because processes can do "unsafe" things in a telephony exchange.

Does this mean that Pony wouldn't be a good fit for such type of application?

And would the whole Pony instance crash when one actor "fails"?


Malthe Borch
 

On 9 March 2017 at 11:52, Arie van Wingerden <xapwing@...> wrote:
In Erlang there is the "let it crash" philosophy.
So far, what I get from Pony, it will try to avoid a crash at all cost.
The "let it crash" philosophy is about knowing that pure-Pony
processes can terminate – or "crash" – safely without ever bringing
down the runtime.

You can design a system around this principle of runtime safety. It is
to my understanding exactly the same principle as in Erlang. A
consequence is that you can expect to be able to observe process
lifetimes from within the system – knowing that the observer, being
also part of the system, will most likely still be alive and kicking.

\malthe


 

Hi Arie,

There's a lot of nuance missing from your question. I'm going to try to provide a little but my time at the moment is relatively short. 
What does "let it crash" mean in the context of Erlang? We can't really discuss differences between Pony and Erlang without discussing
what "let it crash" means in a variety of contexts.

Let's take one example:

In Erlang, you can cause a process to crash by sending it a message it doesn't understand. One assumption you can make in this case is that it indicates an error and process should exit (aka crash). This makes a good deal of sense in the context of Erlang's original creation. It was meant to be used on switches that should never shut down. You should be able to update a running system via hot code loading. What should the default response be to an actor getting a message it doesn't understand? "Crash" seems reasonable to me. Where "crash" is that single actor. It is certainly better than silently eating the error.

Now, let's try to apply that to Pony. A Pony program will not compile if you try to send a message to an actor that it doesn't understand. This includes not only the behavior name but the types of the parameters. There is no "crash" scenario in this case. Darach does like to refer to this as "let it crash at compile time" but ¯\_(ツ)_/¯, it's not the Erlang case.

There are pros and cons to each approach. 

Re: the original Erlang use case and applicability of Pony for it:

The original Erlang use case called for zero downtime and hot code loading to satisfy that requirement. Pony does not do hot code loading at this point in time so if that is a hard requirement for your project, Pony is not appropriate. Note this has nothing to do with "let it crash". "Let it crash" is a completely reasonable strategy based on how Erlang operates and what it was designed for. 

When we have distributed Pony and if we have hot code loading, this will create some interesting problems with Pony. People like to say Pony is like Erlang but fast. That is a very misleading generalization that really does a disservice to Erlang. Pony has taken inspiration from Erlang but is not Erlang. As it stands right now, Pony is designed for writing high-performance, highly concurrent single unix process applications. If you need to have multiple processes work together, then cross unix process communication is your problem. 

The Pony type system requires all errors to be handled or the program will not compile. This is fundamentally different than Erlang at a very basic level and this will lead Pony to be a very different language than Erlang. Yes, they both have actors and message passing but there are fundamental core differences that make a comparison of the two not particularly meaningful.

To be very clear for anyone reading in the future, I am not calling either approach better. They are different. That is all. 

-Sean-


On Thu, Mar 9, 2017 at 5:52 AM, Arie van Wingerden <xapwing@...> wrote:
No flamewars please!!!
Just an honest question!

In Erlang there is the "let it crash" philosophy.
So far, what I get from Pony, it will try to avoid a crash at all cost.

In Erlang (originally the thelephony world) it seems to be an appropriate philosophy, because processes can do "unsafe" things in a telephony exchange.

Does this mean that Pony wouldn't be a good fit for such type of application?

And would the whole Pony instance crash when one actor "fails"?



Arie van Wingerden <xapwing@...>
 

Thanks for the detailed answers!

Op 9 mrt. 2017 13:55 schreef "Sean T. Allen" <sean@...>:

Hi Arie,

There's a lot of nuance missing from your question. I'm going to try to provide a little but my time at the moment is relatively short. 
What does "let it crash" mean in the context of Erlang? We can't really discuss differences between Pony and Erlang without discussing
what "let it crash" means in a variety of contexts.

Let's take one example:

In Erlang, you can cause a process to crash by sending it a message it doesn't understand. One assumption you can make in this case is that it indicates an error and process should exit (aka crash). This makes a good deal of sense in the context of Erlang's original creation. It was meant to be used on switches that should never shut down. You should be able to update a running system via hot code loading. What should the default response be to an actor getting a message it doesn't understand? "Crash" seems reasonable to me. Where "crash" is that single actor. It is certainly better than silently eating the error.

Now, let's try to apply that to Pony. A Pony program will not compile if you try to send a message to an actor that it doesn't understand. This includes not only the behavior name but the types of the parameters. There is no "crash" scenario in this case. Darach does like to refer to this as "let it crash at compile time" but ¯\_(ツ)_/¯, it's not the Erlang case.

There are pros and cons to each approach. 

Re: the original Erlang use case and applicability of Pony for it:

The original Erlang use case called for zero downtime and hot code loading to satisfy that requirement. Pony does not do hot code loading at this point in time so if that is a hard requirement for your project, Pony is not appropriate. Note this has nothing to do with "let it crash". "Let it crash" is a completely reasonable strategy based on how Erlang operates and what it was designed for. 

When we have distributed Pony and if we have hot code loading, this will create some interesting problems with Pony. People like to say Pony is like Erlang but fast. That is a very misleading generalization that really does a disservice to Erlang. Pony has taken inspiration from Erlang but is not Erlang. As it stands right now, Pony is designed for writing high-performance, highly concurrent single unix process applications. If you need to have multiple processes work together, then cross unix process communication is your problem. 

The Pony type system requires all errors to be handled or the program will not compile. This is fundamentally different than Erlang at a very basic level and this will lead Pony to be a very different language than Erlang. Yes, they both have actors and message passing but there are fundamental core differences that make a comparison of the two not particularly meaningful.

To be very clear for anyone reading in the future, I am not calling either approach better. They are different. That is all. 

-Sean-


On Thu, Mar 9, 2017 at 5:52 AM, Arie van Wingerden <xapwing@...> wrote:
No flamewars please!!!
Just an honest question!

In Erlang there is the "let it crash" philosophy.
So far, what I get from Pony, it will try to avoid a crash at all cost.

In Erlang (originally the thelephony world) it seems to be an appropriate philosophy, because processes can do "unsafe" things in a telephony exchange.

Does this mean that Pony wouldn't be a good fit for such type of application?

And would the whole Pony instance crash when one actor "fails"?



Scott Fritchie
 

On Thu, Mar 9, 2017 at 7:55 AM, Sean T. Allen <sean@...> wrote:
In Erlang, you can cause a process to crash by sending it a message it doesn't understand. One assumption you can make in this case is that it indicates an error and process should exit (aka crash).

Hi, everyone.  I've been lurking for quite a while but haven't been able to chime in yet on Pony stuff.  I *can* ring about Erlang stuff.

With Erlang processes, the receiver has a lot of control over what it receives.  All messages are technically sent to a process's mailbox.  The process decides when it will try to pull a message out of the mailbox and what pattern(s) to match when it finally does call 'receive'.

If the pattern(s) is very specific, then a stray/bad/... message may sit in the mailbox forever.  (Translation: memory leak).  If the pattern match is very general, then the stray/... message will be pulled out of the mailbox and used.  Later pattern matches and/or explicit logic could cause the process to crash, sure.

What's missing from the picture?

Implicit in most Erlang/OTP programs is using the OTP process behaviors for your app.  The supervisor behavior (aka a design/architecture "pattern") allows you to build a tree out of processes.  Each node in the tree has a role of "supervisor" or "worker".  Workers do real work, whatever the app does.  The sole role of the supervisor is to detect & react to workers crashing.

Most of the time, a supervisor will restart a crashed worker.   MTTR (mean time to recovery) is frequently measurable in a millisecond or two(*).

Erlang developers like "let it crash" because the app's design is using supervisors, so any crash is very quickly followed by "let me start another worker!"  "Let it crash" sounds very provocative & memorable, but "let me start another worker" isn't.  Too bad, because supervisors are a really fun way to design robust apps.

-Scott

(*) Such small MTTR means that you can put some shockingly buggy code into production and often still meet SLA goals despite crazy-frequent failures.


 

Scott,

You raise an excellent point that I glossed over. 

-Sean-


On Thu, Mar 9, 2017 at 12:38 PM, Scott Fritchie <fritchie@...> wrote:
On Thu, Mar 9, 2017 at 7:55 AM, Sean T. Allen <sean@...> wrote:
In Erlang, you can cause a process to crash by sending it a message it doesn't understand. One assumption you can make in this case is that it indicates an error and process should exit (aka crash).

Hi, everyone.  I've been lurking for quite a while but haven't been able to chime in yet on Pony stuff.  I *can* ring about Erlang stuff.

With Erlang processes, the receiver has a lot of control over what it receives.  All messages are technically sent to a process's mailbox.  The process decides when it will try to pull a message out of the mailbox and what pattern(s) to match when it finally does call 'receive'.

If the pattern(s) is very specific, then a stray/bad/... message may sit in the mailbox forever.  (Translation: memory leak).  If the pattern match is very general, then the stray/... message will be pulled out of the mailbox and used.  Later pattern matches and/or explicit logic could cause the process to crash, sure.

What's missing from the picture?

Implicit in most Erlang/OTP programs is using the OTP process behaviors for your app.  The supervisor behavior (aka a design/architecture "pattern") allows you to build a tree out of processes.  Each node in the tree has a role of "supervisor" or "worker".  Workers do real work, whatever the app does.  The sole role of the supervisor is to detect & react to workers crashing.

Most of the time, a supervisor will restart a crashed worker.   MTTR (mean time to recovery) is frequently measurable in a millisecond or two(*).

Erlang developers like "let it crash" because the app's design is using supervisors, so any crash is very quickly followed by "let me start another worker!"  "Let it crash" sounds very provocative & memorable, but "let me start another worker" isn't.  Too bad, because supervisors are a really fun way to design robust apps.

-Scott

(*) Such small MTTR means that you can put some shockingly buggy code into production and often still meet SLA goals despite crazy-frequent failures.