ENGR 1330 Computational Thinking with Data Science

Last GitHub Commit Date: 31 January 2021

Lesson 4 Loops and Flowcharts:


Special Script Blocks


Objectives

1) Develop awareness of loops, and their utility in automation.

2) Develop awareness of flowcharts as a tool for:


Repetition and Loops

Computational thinking (CT) concepts involved are:

The action of doing something over and over again (repetition) is called a loop. Basically, Loops repeats a portion of code a finite number of times until a process is complete. Repetitive tasks are very common and essential in programming. They save time in coding, minimize coding errors, and leverage the speed of electronic computation.

Loop Analogs

If you think any mass manufacturing process, we apply the same process again and again. Even for something very simple such as preparing a peanut butter sandwich:

Consider the flowchart in Figure 1, it represents a decomposition of sandwich assembly, but at a high level -- for instance, Gather Ingredients contains a lot of substeps that would need to be decomposed if fully automated assembly were to be accomplished; nevertheless lets stipulate that this flowchart will indeed construct a single sandwich.

Figure 1 Supervisory Flowchart Sandwich Assembly (adapted from http://www.str-tn.org/subway_restaurant_training_manual.pdf)

If we need to make 1000 peanut butter sandwichs we would then issue a directive to:

1) Implement sandwich assembly, repeat 999 times (repeat is the loop structure) (A serial structure, 1 sandwich artist, doing same job over and over again)

OR

2) Implement 1000 sandwich assembly threads (A parallel structure, 1000 sandwich artists doing same job once)

In general because we dont want to idle 999 sandwich artists, we would choose the serial structure, which frees 999 people to ask the existential question "would you like fries with that?"

All cynicism aside, an automated process such as a loop, is typical in computational processing.

Aside NVIDIA CUDA, and AMD OpenGL compilers can detect the structure above, and if there are enough GPU threads available , create the 1000 sandwich artists (1000 GPU threads), and run the process in parallel -- the actual workload is unchanged in a thermodynamic sense, but the apparent time (in human terms) spent in sandwich creation is a fraction of the serial approach. This parallelization is called unrolling the loop, and is a pretty common optimization step during compilation. This kind of programming is outside the scope of this class.

Main attractiveness of loops is:

There are 2 main types loops based on the repetition control condition; for loops and whileloops.

For Loop (Count controlled repetition structure)

Count-controlled repetition is also called definite repetition because the number of repetitions is known before the loop begins executing. When we do not know in advance the number of times we want to execute a statement, we cannot use count-controlled repetition. In such an instance, we would use sentinel-controlled repetition.

A count-controlled repetition will exit after running a certain number of times. The count is kept in a variable called an index or counter. When the index reaches a certain value (the loop bound) the loop will end.

Count-controlled repetition requires

We can use both for and while loops, for count controlled repetition, but the for loop in combination with the range() function is more common.

Structured FOR loop

We have seen the for loop already, but we will formally introduce it here. The for loop executes a block of code repeatedly until the condition in the for statement is no longer true.

Looping through an iterable

An iterable is anything that can be looped over - typically a list, string, or tuple. The syntax for looping through an iterable is illustrated by an example.

First a generic syntax

for a in iterable:
print(a)

Notice our friends the colon : and the indentation.

The range() function to create an iterable

The range(begin,end,increment) function will create an iterable starting at a value of begin, in steps defined by increment (begin += increment), ending at end.

So a generic syntax becomes

for a in range(begin,end,increment):
print(a)

The examples that follow are count-controlled repetition (increment skip if greater)

Example for loops

Sentinel-controlled repetition.

When loop control is based on the value of what we are processing, sentinel-controlled repetition is used. Sentinel-controlled repetition is also called indefinite repetition because it is not known in advance how many times the loop will be executed. It is a repetition procedure for solving a problem by using a sentinel value (also called a signal value, a dummy value or a flag value) to indicate "end of process". The sentinel value itself need not be a part of the processed data.

One common example of using sentinel-controlled repetition is when we are processing data from a file and we do not know in advance when we would reach the end of the file.

We can use both for and while loops, for Sentinel controlled repetition, but the while loop is more common.

Structured WHILE loop

The while loop repeats a block of instructions inside the loop while a condition remainsvtrue.

First a generic syntax

while condition is true:
    execute a
    execute b
....

Notice our friends the colon : and the indentation again.

Example while loops

Nested Repetition

Nested repetition is when a control structure is placed inside of the body or main part of another control structure.

break to exit out of a loop

Sometimes you may want to exit the loop when a certain condition different from the counting condition is met. Perhaps you are looping through a list and want to exit when you find the first element in the list that matches some criterion. The break keyword is useful for such an operation.

For example run the following program:

In the first case, the for loop only executes 3 times before the condition j == 6 is TRUE and the loop is exited. In the second case, j == 7 never happens so the loop completes all its anticipated traverses.

In both cases an if statement was used within a for loop. Such "mixed" control structures are quite common (and pretty necessary). A while loop contained within a for loop, with several if statements would be very common and such a structure is called nested control. There is typically an upper limit to nesting but the limit is pretty large - easily in the hundreds. It depends on the language and the system architecture ; suffice to say it is not a practical limit except possibly for general-domain AI applications.


We can also do mundane activities and leverage loops, arithmetic, and format codes to make useful tables like

The continue statement

The continue instruction skips the block of code after it is executed for that iteration. It is best illustrated by an example.

The try, except structure

An important control structure (and a pretty cool one for error trapping) is the try, except statement.

The statement controls how the program proceeds when an error occurs in an instruction. The structure is really useful to trap likely errors (divide by zero, wrong kind of input) yet let the program keep running or at least issue a meaningful message to the user.

The syntax is:

try:
do something
except:
do something else if ``do something'' returns an error

Here is a really simple, but hugely important example:

So this silly code starts with x fixed at a value of 12, and y starting at 12 and decreasing by 1 until y equals -1. The code returns the ratio of x to y and at one point y is equal to zero and the division would be undefined. By trapping the error the code can issue us a measure and keep running.

Modify the script as shown below,Run, and see what happens

Flowcharts

What is a Flowchart?

A flowchart is a type of diagram that represents a workflow or process. A flowchart can also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to solving a task.

Figure 2 Repair Flowchart for a Lamp https://en.wikipedia.org/wiki/Flowchart

The flowchart shows the steps as boxes of various kinds, and their order by connecting the boxes with arrows. This diagrammatic representation illustrates a solution model to a given problem. Flowcharts are used in analyzing, designing, documenting or managing a process or program in various fields.

There is a symbol convention (a language) as depicted in Figure 2 below (from: https://en.wikipedia.org/wiki/Flowchart)

Figure 1 Flowchart Symbols https://en.wikipedia.org/wiki/Flowchart

IBM engineers implemented programming flowcharts based upon Goldstine and von Neumann's unpublished report, "Planning and coding of problems for an electronic computing instrument, Part II, Volume 1" (1947), which is reproduced in von Neumann's collected works.

The flowchart became a popular tool for describing computer algorithms, but its popularity decreased in the 1970s, when interactive computer terminals and third-generation programming languages became common tools for computer programming, since algorithms can be expressed more concisely as source code in such languages. Often pseudo-code is used, which uses the common idioms of such languages without strictly adhering to the details of a particular one.

Nowadays flowcharts are still used for describing computer algorithms.[9] Modern techniques such as UML activity diagrams and Drakon-charts can be considered to be extensions of the flowchart.

Nearly all flowcharts focus on on some kind of control, rather than on the particular flow itself! While quaint today, they are an effective way to document processes in a program and visualize structures. We recomend you get in the habit of making rudimentary flowcharts, at least at the supervisory level (the sandwich chart above)

How are they useful?

(paraphrased from https://www.breezetree.com/articles/top-reasons-to-flowchart)

Sometimes it's more effective to visualize something graphically that it is to describe it with words. That is the essence of what flowcharts do for you. Flowcharts explain a process clearly through symbols and text. Moreover, flowcharts give you the gist of the process flow in a single glance. The following are some of the more salient reasons to use flowcharts.

Process Documentation / Training Materials Another common use for flowcharts is to create process documentation. Although this reason overlaps with regulatory and quality management requirements (below), many non-regulated businesses use flowcharts for their documentation as well. These can range in form from high-level procedures to low-level, detailed work instructions.

You may think that this applies mainly to large organizations, but small companies can greatly benefit from flowcharting their processes as well. Small enterprises need to be nimble and organized. Standardizing their processes is a great way to achieve this. In fact, the popular entrepreneurial book The E-Myth Revisited: Why Most Small Businesses Don't Work and What to Do About It by Michael Gerber is based on the fact that small businesses are more likely to succeed if they treat their operations like a franchise. in a nutshell, this means standardizing and documenting their business processes. There's no better way to do that than with flowcharts, right?

Training materials are often created using flowcharts because they're visually stimulating and easy to understand. A nicely laid out flowchart will gain and hold the reader's attention when a block of text will often fail.

Workflow Management and Continuous Improvement Workflows don't manage themselves. To ensure that you are meeting your customers' needs, you need to take control of your business processes. The first step to workflow management is to define the current state of your processes by creating an "As-Is Flowchart". That allows you to analyze your processes for waste and inefficiency. After you have identified areas for process improvement, you can then craft new flowcharts to document the leaner processes.

Programming Information technology played a big influence on the use and spread of flowcharts in the 20th century. While Dr. W. Edwards Deming was advocating their use in quality management, professionals in the data processing world were using them to flesh out their programming logic. Flowcharts were a mainstay of procedural programming, however, and with the advent of object oriented programming and various modeling tools, the use of flowcharts for programming is no longer as commonplace as it once was.

That said, even with in the scope of object oriented programming, complex program logic can be modeled effectively using a flowchart. Moreover, diagramming the user's experience as they navigate through a program is a valuable prerequisite prior to designing the user interface. So flowcharts still have their place in the world of programming.

Troubleshooting Guides Most of us have come across a troubleshooting flowchart at one time or another. These are usually in the form of Decision Trees that progressively narrow the range of possible solutions based on a series of criteria. The effectiveness of these types of flowcharts depends on how neatly the range of problems and solutions can fit into a simple True/False diagnosis model. A well done troubleshooting flowcharts can cut the problem solving time greatly.

Regulatory and Quality Management Requirements Your business processes may be subject to regulatory requirements such as Sarbanes-Oxley (SOX), which requires that your accounting procedures be clearly defined and documented. An easy way to do this is to create accounting flowcharts for all your accounting processes.

Similarly, many organizations fall under certification requirements for quality management systems - such as ISO 9000, TS 16949, or one of the many others. In such environments, flowcharts are not only useful but in certain clauses they are actually mandated.

Sorting (Important Flow Control Cases)

Advanced/Optional Topic

A frequent task in data science, engineering, etc. is the seemingly mundane task of sorting or ordering things. Here we explore a couple of simple sorting algorithms, just to show some of the thoughts that go into such a task, then will ultimately resort to the internal sorting routines built into Python.

Sorting is frequently necessary when data are being handled; for example in integration and differentiation the data are usually presented to the various algorithms in ascending or descending order (at least on the x-axis). One may have tables of numbers, representing one or more explanatory variables, and one or more responses. At times we may need to arrange these tables in an order dictated by one or another of these various variables. Alternatively we may want to nd the median value or upper quartile of such a list { this task requires sorting. When sorting, one can also carry along operations to maintain correspondence with other lists (for lack of better name lets call this sort-and-carry). Tasks that fall under the broad category of sorting are:

The choice of method depends on the size of the list that needs to be sorted. If the list is short (perhaps $N < 50$ elements) then straight insertion is fast enough, concise, and simple to program. For a long list ($N > 1000$ elements) Quicksort is faster, but achieves the speed by use of extra memory. Heapsort is also good for large lists, and is an in-place routine.

Python lists have a built-in sort() method that modifies the list in-place and a sorted() built-in function that builds a new sorted list from an iterable. So when sorting needs to be done, you should use the built-in tools. However, because it is a useful programming construct, the three sorting algorithms are presented as Python primitive codes.

Bubble Sort

The bubble sort is a place to start despite it's relative slowness. It is a pretty reviled algorithm (read the Wikipedia entry), but it is the algorithm that a naive programmer might cobble together in a hurry, and despite its shortcomings (it's really slow and inefficient), it is robust.

Here is a description of the sorting task as described by Christian and Griffths (2016) (pg. 65):

"Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out- of-order pairs - Wallace followed by Pynchon, for instance - and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any more out-of-order pairs on the entire shelf, then you know the job is done. This process is a Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. (We spot a tiny problem, make a tiny fix.) So in the worst case, where the shelf is perfectly backward, at least one book will need to be moved n positions. Thus a maximum of n passes through n books, which gives us O(n2) in the worst case. For instance, it means that sorting five shelves of books will take not five times as long as sorting a single shelf, but twenty-five times as long."

Converting the word description into Python is fairly simple. We will have a vector of n numbers (we use a vector because its easy to step through the different positions), and we will scan through the vector once (and essentially find the smallest thing), and put it into the first position. Then we scan again from the second position and find the smallest thing remaining, and put it into the second position, and so on until the last scan which should have the remaining largest thing. If we desire a decreasing order, simply change the sense of the comparison.

The algorithm defines an array and then sorts by repeated passes through the array. The program (outside of the sorting algorithm) is really quite simple.

In the script we see that the program (near the bottom of the file) assigns the values to the vector named array and the initial order of the array is ${1003, 3.2, 55.5,-0.0001,-6, 666.6, 102}$. The smallest value in the example is -6 and it appears in the 5-th position, not the 1-st as it should.

The first pass through the array will move the largest value, 1003, in sequence to the right until it occupies the last position. Repeated passes through the array move the remaining largest values to the right until the array is ordered. One can consider the values of the array at each scan of the array as a series of transformations (irow-th scan) -- in practical cases we don't necessarily care about the intermediate values, but here because the size is manageable and we are trying to get our feet wet with algorithms, we can look at the values. The sequence of results (transformations) after each pass through the array is shown in the following list:

  1. Initial value: [1003; 3,2; 55,5;-0,0001;-6; 666,6; 102].
  2. First pass: [3,2; 55,5;-0,0001;-6; 666,6; 102; 1003].
  3. Second pass: [3,2;-0,0001;-6; 55,5; 102; 666,6; 1003].
  4. Third pass: [-0,0001;-6; 3,2; 55,5; 102; 666,6; 1003].
  5. Fourth pass: [-6;-0,0001; 3,2; 55,5; 102; 666,6; 1003].
  6. Fifth pass: [-6;-0,0001; 3,2; 55,5; 102; 666,6; 1003]. Sorted, fast scan.
  7. Sixth pass: [-6;-0,0001; 3,2; 55,5; 102; 666,6; 1003]. Sorted, fast scan. We could probably add additional code to break from the scans when we have a single pass with no exchanges (like the last two scans) -- while meaningless in this example, for larger collections of things, being able to break out when the sorting is complete is a nice feature.

Insertion Sort

The next type of sorting would be to select one item and locate it either left or right of an adjacent item based on its size { like sorting a deck of cards, or perhaps a better description { again using the bookshelf analog from Christian and Griffths (2016) (pg. 65)

You might take a different tack -- pulling all the books off the shelf and putting them back in place one by one. You'd put the ffrst book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you'd run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you'd be done. Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it's arguably even more intuitive than Bubble Sort and doesn't have quite the bad reputation. The bad news is that it's not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect." Listing 8 is an R implementation of a straight insertion sort. The script is quite compact, and I used indentation and extra line spacing to keep track of the scoping delimiters. The sort works as follows, take the an element of the array (start with 2 and work to the right) and put it into a temporary location (called swap in my script). Then compare locations to the left of swap. If smaller, then break from the loop, exchange values, otherwise the values are currently ordered. Repeat (starting at the next element) , when all elements have been traversed the resulting vector is sorted. Here are the transformations for each pass through the outer loop:

Straight Insertion

The straight insertion sort is the algorithm a card player would use to sort cards. Pick out the second card and put it into order with respect to the first; then pick the third card and insert it into sequence with the first two; continue until the last card is picked out and inserted. Once the last card is sequenced, the result is a sorted deck (list). Python implementation of such an algorithm is:

Probably useful to put into a functional structure:

Merge Sort

A practical extension of these slow sorts is called the Merge Sort. It is an incredibly useful method. One simply breaks up the items into smaller arrays, sorts those arrays - then merges the sub-arrays into larger arrays (now already sorted), and nally merges the last two arrays into the nal, single, sorted array. Here is a better description, again from Christian and Griffths (2016):

" ... information processing began in the US censuses of the nineteenth century, with the development, by Herman Hollerith and later by IBM, of physical punch-card sorting devices. In 1936, IBM began producing a line of machines called \collators" that could merge two separately ordered stacks of cards into one. As long as the two stacks were themselves sorted, the procedure of merging them into a single sorted stack was incredibly straightforward and took linear time: simply compare the two top cards to each other, move the smaller of them to the new stack you're creating, and repeat until finished.

The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful and ultimate conclusion. Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you'd build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck - with a final climactic merge, like a riffe shuffle's order- creating twin, producing the desired result. This approach is known today as Merge Sort, one of the legendary algorithms in computer science."

There are several other variants related to Merge Sort; Quicksort and Heapsort being close relatives;

Heapsort

Need narrative here

Lexicographical Sorting

Need narrative here

I conclude the section on sorting with one more quoted section from Christian and Griffiths (2016) about the value for sorting - which is already relevant to a lot of data science:

"The poster child for the advantages of sorting would be an Internet search engine like Google. It seems staggering to think that Google can take the search phrase you typed in and scour the entire Internet for it in less than half a second. Well, it can't - but it doesn't need to.

If you're Google, you are almost certain that (a) your data will be searched, (b) it will be searched not just once but repeatedly, and (c) the time needed to sort is somehow less valuable" than the time needed to search. (Here, sorting is done by machines ahead of time, before the results are needed, and searching is done by users for whom time is of the essence.) All of these factors point in favor of tremendous up-front sorting, which is indeed what Google and its fellow search engines do."

References

  1. Computational and Inferential Thinking Ani Adhikari and John DeNero, Computational and Inferential Thinking, The Foundations of Data Science, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND) Chapters 3-6 https://www.inferentialthinking.com/chapters/03/programming-in-python.html

  2. Learn Python the Hard Way (Online Book) (https://learnpythonthehardway.org/book/) Recommended for beginners who want a complete course in programming with Python.

  3. LearnPython.org (Interactive Tutorial) (https://www.learnpython.org/) Short, interactive tutorial for those who just need a quick way to pick up Python syntax.

  4. Brian Christian and Tom Griffiths (2016) ALGORITHMS TO LIVE BY: The Computer Science of Human Decisions Henry Holt and Co. (https://www.amazon.com/Algorithms-Live-Computer-Science-Decisions/dp/1627790365)