|  | Speculation about Dijkstra's primitives and how small each f |  | |
| | | Robert Maas, http://tinyu |  |
| Posted: Fri Aug 15, 2008 7:23 am Post subject: Speculation about Dijkstra's primitives and how small each f |  |
| |  | |
Dijkstra took function calls (including passing parameters to called function and collecting return values) for granted, treating each as an atomic action, and then said that a newly defined function must combine these atomic actions in only three ways: - Sequence (which can involve dataflow: return values from early calls feed into parameters to later calls) - Decision (IF/THEN or CASE/SELECT/COND) - Repeating loop with test for exit (WHILE or FOR or MAP; i.e. LOOP in CL) This restriction formed the basis of structured programming and functional programming (except that in functional programming a fourth type of combiner is allowed: LAMBDA/CURRY).
Extreme advocates for the language 'Forth' decided that each 'word' (corresponds to 'function' or 'procedure' or 'subroutine' or 'method' in most other languages) must combine atomic actions in just one (1) of those three ways. If you have something that uses two or more of Dijkstra's combiners, then the function/word ought to be split into smaller functions/words each of which uses just one (1) of Dijkstra's combiners.
But tonight I got a new idea: In a "smart system", there are several "facts" which are "known" by the software and used within the code that comprises the system. Perhaps there should be a Dijkstra/Forth style restriction that each function must know at most one fact. If you have a function that knows two (or more) unrelated facts, that function ought to be split into separate functions each of which knows just one of those facts. Furthermore each fact must be known by just one function.
Note that if you have a dataset that contains a large number of specific facts all of the same type, accessed in a uniform manner, that entire set should be treated like a table in a relational database: Each access method for that table/dataset should be a single function. So a single access method for a whole set of related facts in a RDBS or similar dataset counts as just one "fact" per my proposed restriction.
What do y'all think of my idea? |
| |
| | | Calum Grant |  |
| Posted: Mon Aug 18, 2008 3:11 pm Post subject: Re: Speculation about Dijkstra's primitives and how small ea |  |
| |  | |
Robert Maas, LINK wrote:
| Quote: | Dijkstra took function calls (including passing parameters to called function and collecting return values) for granted, treating each as an atomic action, and then said that a newly defined function must combine these atomic actions in only three ways: - Sequence (which can involve dataflow: return values from early calls feed into parameters to later calls) - Decision (IF/THEN or CASE/SELECT/COND) - Repeating loop with test for exit (WHILE or FOR or MAP; i.e. LOOP in CL) This restriction formed the basis of structured programming and functional programming (except that in functional programming a fourth type of combiner is allowed: LAMBDA/CURRY).
Extreme advocates for the language 'Forth' decided that each 'word' (corresponds to 'function' or 'procedure' or 'subroutine' or 'method' in most other languages) must combine atomic actions in just one (1) of those three ways. If you have something that uses two or more of Dijkstra's combiners, then the function/word ought to be split into smaller functions/words each of which uses just one (1) of Dijkstra's combiners.
But tonight I got a new idea: In a "smart system", there are several "facts" which are "known" by the software and used within the code that comprises the system. Perhaps there should be a Dijkstra/Forth style restriction that each function must know at most one fact. If you have a function that knows two (or more) unrelated facts, that function ought to be split into separate functions each of which knows just one of those facts. Furthermore each fact must be known by just one function.
Note that if you have a dataset that contains a large number of specific facts all of the same type, accessed in a uniform manner, that entire set should be treated like a table in a relational database: Each access method for that table/dataset should be a single function. So a single access method for a whole set of related facts in a RDBS or similar dataset counts as just one "fact" per my proposed restriction.
What do y'all think of my idea?
|
Well if you are interested in managing "facts" then Prolog is a great place to start. One of my favourite languages.
As for restricting the responsibilities of functions - in OOP there's a principle of single responsibility, so each class/function has one responsibility. E.g. if you have a class "egg_and_beans" then that's two responsibilities and the class should be split into two. However if you have a class "breakfast" then that's coherent, and can again be one class.
The comment on classes could equally apply to functions. They should have one clear responsibility to avoid confusion and maximise reuse.
I've philosophised about this "paradox" that classes can do more yet have fewer responsibilities. I think it's because the level of abstraction has been raised.
What constitutes a "fact"? You might need a hierarchical concept of facts, which matches your levels of abstraction. But that's just structured programming. The flip of decomposition is composition. You might also need to combine facts in different ways, and you're back to logic programming which can express such things nicely.
I'd draw you away from the idea of "facts", since unless you are doing logic programming, you are really interested in the "purpose" of a component, and functions do more than manage facts.
As for each function knowing one "fact", there's a related principle called the "Law of Demeter" (LoD). As a guiding principle it's very useful, but is probably too much work to observe strictly.
LoD states that functions should not make assumptions about the internals of the objects it uses. It forbids code like toaster.handle.depress() (since you shouldn't probe the specific features of the toaster), and prefers code like toaster.toast(bread), thereby pushing the implementation into the object providing the functionality. Actually that makes sense, since a future toaster could use a laser-beam to toast, in which case is doesn't have a handle, but could provide a toast() method. LoD is designed to make code more robust.
Also you talk about functions managing facts, but facts can be stored as well as computed. There's a great programming paradigm which I particularly like, which has decomposition, and binds functions to data at various levels of abstraction. It's called object-oriented programming! ;-)
Cheers,
Calum -- Calum Grant - LINK |
| |
| | | Robert Maas, http://tinyu |  |
| Posted: Fri Sep 26, 2008 3:11 pm Post subject: Re: Speculation about Dijkstra's primitives and how small ea |  |
| |  | |
REM> Dijkstra ... combine ... in only three ways: REM> Sequence ... Decision ... Repeating loop REM> Extreme advocates for the language 'Forth' decided that each 'word' ... REM> must combine atomic actions in just one (1) of those three ways. REM> But tonight I got a new idea: ... Perhaps there should be a REM> Dijkstra/Forth style restriction that each function must know at REM> most one fact.
| Quote: | From: Calum Grant <n...@hotmail.com Well if you are interested in managing "facts" then Prolog is a great place to start. ...
|
I don't see how that directly relates to my point. My idea is that people who write computer software would make sure to introduce some fact (about how to implement an algorithm, such as the fact that a particular kind of data record has three fields with various names and data types) in only one place per such fact. Thus for example there'd be an accessor who knows the implementation of LastName within a Person record, and nobody else would know that particular implementation, everyone else must go through the accessor method. Or a sorted map might be implemented as a MBBT or as a red/black tree or as an AVL tree, and nobody outside that particular sorted-map class needs to know which of those three algorithms is actually being used. Or a conversion package between English and metric units knows the conversion rate between pounds and kilograms, and even it knows that in just one place, mutiplying or dividing as appropriate, or generating an inverse once to allow multiplying at all times, rather than having that constant handcoded in both direct and inverted form separately.
I don't see how Prolog will help much with telling the person writing the software when he/she has made a mistake and coded the same "fact" or related "facts" such as conversion and inverse conversion in two separate places.
| Quote: | As for restricting the responsibilities of functions - in OOP there's a principle of single responsibility, so each class/function has one responsibility. E.g. if you have a class "egg_and_beans" then that's two responsibilities and the class should be split into two. However if you have a class "breakfast" then that's coherent, and can again be one class.
|
Yes, that's part of the basis for my generalization of that idea.
| Quote: | The comment on classes could equally apply to functions. They should have one clear responsibility to avoid confusion and maximise reuse.
|
Yes, that was part of my intent. It doesn't matter whether we're dealing with what are called in various languages "functions" or "methods" or "words" or "procedures" or "subroutines" etc., it's all the same in regard to this principle of single owner of the key fact that drives the algorithms.
Ideally it should be possible to abstract out what are usually design patterns to become single-location abstraction-patterns, such as might be implemented as "macros" (actually source-parse-tree transformations prior to execution) in Lisp. C++ made a crude attempt in this direction with "templates", but IMO that attempt is so narrow-minded as to be grossly insufficient for this grand plan. For example, there's a design pattern of callable producer and calling consumer for stream/sequence data (Java notation): handle = new HandleOnStream(params); while (handle.hasNext()) { item = handle.getNext(); process(item); } or alterately with late signal of end-of-stream, needed for filtered streams where you can't know in advance whether you will be able to get more filter-valid data or not: handle = new HandleOnStream(params); while (true) { item = handle.getNext(); if (item == NULL) break; process(item); } It shouldn't be necessary to copy&paste either of those templates over and over and over. All you should need is a "macro" for either design pattern that takes parameters for each variable part and builds the corresponding code automatically (Lisp notation): (iterateOverStream #'HandleOnStream listOfParams #'process) That assumes that new HandleOnStream returns an iterator object which satisfies the interface of supporting getNext which returns NULL whenever end-of-sequence has been reached.
| Quote: | I've philosophised about this "paradox" that classes can do more yet have fewer responsibilities.
|
I don't believe classes do more, at least not in a D/P sense. Exactly the same "business logic" (data-processing code) gets executed regardless of whether you have formal OOP methods in OOP classes, traditional Lisp functions, traditional Algol procedures, tradition FORTRAN subroutines, or even early-machine-language hackery to fake subroutines that hadn't yet been invented.
| Quote: | What constitutes a "fact"?
|
Something problem-domain-related that is needed to perform a step of D/P (data processing), whether it's a data tranformation such as net = gross - cost, or a logical decision such as anyone over 30 hours per week must not be hourly, or a resource locator such as a filename or URL or name of table within database etc.
Now you can say that holding a filename in a function and then calling that function merely begs the question. Instead of needing to know the fileame, the calling code needs to know the name of the function to call. But needing to know the filename itself is not as robust as needing to know the function name, because functions can be kept in packages in a hierarchial way such that there are a very limited number of undetectable mistakes, whereas filenames are more easy to accidently get wrong and thereby cause havoc in the file system. Also functions have parameter types, which can be checked at compiletime and/or runtime to *detect* most possible mistakes and signal a error for orderly shutdown of the program before havoc ensues. The error (exception) can include enough legible information to quickly track down the coding mistake.
| Quote: | You might need a hierarchical concept of facts, which matches your levels of abstraction. But that's just structured programming.
|
Well, if each fact is encapsulated into a separate function, either the actual fact there if "constant", or a link to a configuration entry if potentially different from year to year in a way that can be anticipated, then you don't need an extra hierarchy to match the structued program, the structured program gives the hierarchy to you already. Now if configuration entries are structured, to make easy to write software that uses new items not previously used or to find already-used items and follow the links back to the software unit that is responsible for all uses of it, that might be a good idea if it's not a burden to implement.
| Quote: | The flip of decomposition is composition. You might also need to combine facts in different ways, and you're back to logic programming which can express such things nicely.
|
Ideally generic algorithms (such as C++ "templates", and Lisp "macros" implementing "special forms"), and ordinary functions can be defined separately, once each, and then configuring their combination for a special application is rather trivial. In some cases you have some parameters that are fixed at combination time (they are a constant for this application) and other parameters that remain variable (so that the resultant fuction can be applied several different ways within the overall application), so in effect you are doing high-level currying. Is that what you have in mind?
| Quote: | I'd draw you away from the idea of "facts", since unless you are doing logic programming, you are really interested in the "purpose" of a component, and functions do more than manage facts.
|
It sounds like you misunderstood my use of "fact". As I used the term, one major fact is the way that a merge-sort is done. You break the overall file into short runs or singleton runs then merge them together 2->1 until only one run remains. This breaks down into several facts, how you make the initial runs, how you do a single merge, and how you organize the cascade of runs until only one remains, so perhaps three facts are inherent to this algorithm itself, with external (known elsewhere) facts related to how linked lists are manipulated and how the sorting function works and where the key is found within the record. All of those external facts could be supplied as functional parameters to the merge-sort algorithm, where each functional parameter must satisfy some particular interface. (In the SORT function in Common Lisp, the list-processing primitives are taken a priori, CAR/CDR/CONS, but the comparison function and key function are given as functional parameters. This works only because list-processing is totally generic type, a single type of CONS cell that handles all possible elements in the CAR position, so you can do *all* variants of sort-merge with a single CAR/CDR/CONS set of functions. In a function that doesn't have generic elements in CAR position, we'd need a separate set of list-processing primitives for each type of object a list can hold, so the list-processing set-of-functions would need to be a parameter also.)
| Quote: | As for each function knowing one "fact", there's a related principle called the "Law of Demeter" (LoD). As a guiding principle it's very useful, but is probably too much work to observe strictly.
|
I looked it up: <http://en.wikipedia.org/wiki/Law_of_Demeter> This seems closely tied in with the concept of interfaces. I don't necessarily mean the formal type of interfaces in Java and Seed7, rather just the idea that the API for a software library won't change drastically, that code you write assuming a particular calling interface will remain valid indefinitely. So when you are writing code that directly deals with handles/pointers on data objects managed by some other module, you stick to calling functions/methods per the published API and don't make any assumptions about what happens "behind the curtain" or "under the hood". (Unlike the Wizard of Oz or your personal automobile, the software library somebody else wrote is generally actually working as advertised, modulo just a few bugs that have been discovered from time to time, and doesn't break down over time.) And of course that the API is "complete" in the sense of providing all the functionality that you would need from that type of data object. (Note: By "type" I can mean either formal OOP class or intentional type interpreted on top of a primitive or OOP class.)
| Quote: | The fundamental notion is that a given object should assume as little as possible about the structure or properties of anything else (including its subcomponents).
|
I hope this allows for an API that returns components of specific types, whereupon the caller can then call whatever library mangages *that* type of data object for further processing of it? For example, a PERSON object might have a way to finding out the NAME, and such a NAME object might have a way of finding out whether the name is Western or Oriental or Nickame or some other sub-type, and in each case appropriate methods would be available to learn the family name or given name etc. as much as possible for each such sub-type. Accessors for components wouldn't *always* be desirable, just in some cases, OK? And some accessors might require a password or crypto-key etc., or a specific authorization certificate public-key signed by an authority above the module managing the confidentia data. And the return value might not be the data value itself but a public-key encrypted version of the data which only the authorized caller can decipher. OK? (Actually the ability to return an OOP object rather than just a primitive type such as string supports such data-access protection.)
| Quote: | LoD states that functions should not make assumptions about the internals of the objects it uses. It forbids code like toaster.handle.depress() (since you shouldn't probe the specific features of the toaster), and prefers code like toaster.toast(bread), thereby pushing the implementation into the object providing the functionality.
|
Maybe in this specific example the call is too knowledgeable of internals, that all toasters have a "handle" and that such handle when depressed performs a useful function that is the same across all toasters. But I'm not convinced this syntax should be forbidden in all cases. For example, should it be forbidden to contact the CEO of a company by doing company.ceo.sendMessage("...") ? If companies have several officers, each of which is a person, with methods for getting each such person object, and if for each person there's a preferred (default) e-contact method, but each person also has several other methods such as getName or checkIfAnyReceivedMessages or getImages etc., what's wrong with first getting this or that person-object related to the company then calling one of the person methods on that person-object? I don't see why you would be required to tell the company object directly the message you want to send to the CEO, or tell the company every time you want to see a photo of the CEO, and I don't see why the company object should have a gazillion methods such as sendCeoMessage sendCfoMessage getCeoImages getCfoImages etc. I think it should be a matter of judgement which is better, mainObject.relatedObjectAccessor.doSomethingWithRelatedObject or mainObject.invokeUltimateTaskSomehow for any particular case.
| Quote: | Actually that makes sense, since a future toaster could use a laser-beam to toast, in which case is [sic] doesn't have a handle, but could provide a toast() method.
|
I have a "toaster oven" that requires rotating a timer knob instead of depressing anything. I imagine anything with a laser would likewise be multi-function, not just a "toaster".
| Quote: | LoD is designed to make code more robust.
|
Whether it accomplishes that goal is an open question, depending on particular circumstances.
| Quote: | Also you talk about functions managing facts, but facts can be stored as well as computed.
|
I hope you're talking about the same topic I'm talking about, namely facts in the sense of how to perform tasks, i.e. algorithms and the primitive data that is essential to them. For example, the fact of how to do a merge is the following algorithm: You have two input streams, one output stream, and a comparison function that tells you whether record#A is less than record#B in some sense. While both input streams still have data, compare the next record of each stream, and if record#A is less than record#B then move record#A from input stream#A to output stream, else move record#B from input stream#B to output stream. (The entire main merge algorithm then follows by copying all the rest of the data from the one remaining input stream that still has input to output the output stream, thus has a second fact about what to do when one of the streams is exhausted. Typically the *entire* merge algorithm also has methods for opening each of the input streams and the output stream and closing them at the end.) In general these algorithmic facts are both stored and computed, i.e. you store the algorithm in the main memory and the execute that code at the appropriate time. I'm not sure that's what you are talking about.
| Quote: | There's a great programming paradigm which I particularly like, which has decomposition, and binds functions to data at various levels of abstraction. It's called object-oriented programming!
|
AFAIK that's not correct. In OOP, data objects are bound to their specific class objects, which in turn are bound to their parents, and somewhere along the chain from the specific type to the most generic type there's a link to the most specific method.
But OOP has a fundamental problem: It ties together all the types of actions for a given class of object, forcing all that disparate code to be located in one place (modulo the possibility of some methods being linked to the most specific class and others to more generic classes), while splitting up all the uses of a given action according to all the various classes that implement it. The opposite organization, of collecting together all the code for doing the same task on lots of different kinds of objects, isn't usually allowed. The other alternative, of having each action+objectType combo defined in a separate place, also isn't usually allowed.
OOP is one way of organizing software, and is not always the best way. I accept that it's sometimes useful, but I don't glorify it as you do.
So let me know if you like my idea for explicitly separating the concept of language-defined datatypes (a.k.a. "internal" or "implementational" datatypes) vs. intentional datatypes, and treating intentional datatypes as the most important when organizing data-processing algorithms/softwareLibraries, and if you'll offer to work with me to organize the world's archive of useful software per which intentional datatypes each software module operates upon, basically a re-organization of what I started with my "CookBook/Matrix". |
| |
|
|