Monday, April 30, 2018

Summation

At this point I ought to cover a few details I may have omitted on my journey. I expect, however, that this checkpoint doesn't signal the end of my bout with examining the Latin square problem. I examined it briefly freshman year with mixed results. This year showed marked improvement; I look forward to potentially continued results.

While I discussed backtracking to an extent, I did not discuss my data structures here employed. In truth, it would appear that my method did not use any data structures. Many data types were employed, notably queues and stacks. I used a queue when attempting to solve the square given pieces. The queue kept track of each incomplete square before it was reprocessed to continue toward the end goal. The stack acted as the function call stack when I checked to see how many possible solutions to a partial Latin square there were. It differed by allowing me to count how many completed squares there were without abandoning the total once one was found.

There never truly was any test data for this experiment. Each run (each square) was generated randomly at runtime. However, I did model the algorithm off of the GRC OTG page. It used an example domain "amazon" and demonstrated certain special edge cases to deal with that random data may include.

Ultimately the project worked as intended for the first tier (generating a square) but not so well for the second tier (recalling the square). Incrementally, I have found various minor coding errors that have potentially hindered progress. It's possible that my approach as a whole, though, even when properly executed, involves some fatal incorrect assumptions. This clarity may come with time. One consideration I am inclined to pursue at the moment is the potential neglecting of the potential for restarting the algorithm with a new word or group of characters. I also may be making a false assumption about the start of the sequences. However, allowing the perpendicular direction to vary freely is inaccurate as it must line up with the end of the previous segment. Blindly adding that modification creates an endless cycle.

A video of this project in action can be found here

Continuing Backwards

This week I've been taking a closer look at solving for these squares. One of the tests involves waiting for "leaf squares" -- where the square only has one solution -- and checking it against the existing data. I stumbled across a peculiar loophole where once the dimensions of the square grew larger than my traditional four or five in depth, the potential solution counter would enumerate what quite obviously is a large number of potential squares. All for the purpose of simply checking if there was one or more than one. Needless to say that was nipped in the bud.

In preparation for presentations and recording a video, I also set up a simple text interface to actually allow for run-time configuration. It does the simple tasks of demonstrating a few of the features completed and the feature that tries its best.

I may have touched on this last week but I imagine that future steps for this algorithm is checking against flipped relations. However, before I tackle that, I've been reviewing the existing code as my understanding is that this approach of checking forward and backward in all positions should find all four valid squares. As such, my progress in that regard has slowed. Once I figure out the oversight there, I should be able to focus more on optimizing the approach.

Operating the brute force approach somewhat shows its incremental nature. I picture it essentially as adding numbers in a system, especially with respect to changing bases. Each time you loop through the ones place, the tens place increments and the ones start over. It is the same case with these moving character groupings but more apparent with larger squares.

Ultimately, if nothing else, I hope to solve this problem, potentially accepting inefficiency as a side effect. Currently, 4x4 squares can be completely enumerated with the data set but will not return a valid square. 6x6 and onward tends to become bogged down in calculation. 5x5 is a mixture with the added bonus of a mystery runtime exception that is eluding me.

Wednesday, April 25, 2018

Cracking Wide Open... Sorta






This week I took a stab at working backwards. The working idea is to shift collections of known letters from combining a domain and password hash to find a valid table. This has shown mixed results.

I've started out testing on small squares for various reasons. With rather quick response times, we get a large number of results from our shifting algorithm. One part builds the shifted piece and checks if it's still a valid square at that shifting. Then it passes off to essentially the original square building routine to check how many valid squares can be pieced together from the simple setup. In other words, it makes and plays Sudoku with itself.

This algorithm may have a few troubles. Latin squares can be flipped horizontally, vertically, or both. This gives us four potential matches from individualized spatial relationships alone. The problem is, this currently gives one of them. Ideally the algorithm should be apt enough to find all four. I imagine the issue may either be that it prematurely ends before working through all the possible shifting permutations of some of the more incomplete squares OR there really only should be one solution by this method. 

The image here shows a few mostly complete squares. It also attempts to take the latest filled square and check against the original square's hashing. As we can see, it leaves considerable room for improvement.

I hope to continue working at the shifting operation to check that it covers as many relevant permutations as possible. It also seems that the hash generation function has a bug, given that it shows consecutive "bbb" which cannot appear in a square.

Wednesday, April 18, 2018

Week One Done






Next I may endeavor to improve this algorithm to prevent erratic spikes in timing or simply move on to the rebuilding algorithm. Of course, that will start with small dimensions. 

I managed to apply a simple recursive backtracking algorithm to this off the grid problem. Writing in Java, I have a script to generate Latin squares; it also can use these to hash a password. There may be a better approach to the recursive approach to rectifying the square cell collisions. When the "problem cell" was set long ago, many other combinations have to be tested in vain. This may be solved by using an actual stack as opposed to the functional stack and scanning backwards until the problem cell is found. However, precisely how we define this actual "problem cell" is unknown to me.

The axes on the below graph are: vertically, average millisecond generation duration; horizontally, square dimensions. Clearly up until 20x20 squares, generation is somewhat trivial. When increasing the iteration count from 20 to 50 and 1000, we numerically see the evidence for the occasional spike in duration. A sample pool of showed a surprise issue with 23x23, for example. One consideration is completely wiping out the row and column of a particular cell when there are no valid choices for the cell. This would take care of figuring out what the true "problem cell" would be, though it may complicate the recursive approach. 
Repeated latin square generation iterations show unreliability








Wednesday, April 11, 2018

Introductory Post

My project will involve examining the OffTheGrid Latin Square password storage system. In this case, a Latin Square is an arrangement of N letters in an N by N matrix where no letter is in a row or column with itself. It was first brought to my attention by my CSIT advisor about a year ago and it seems ready for a round two. The exact approach here is somewhat multidimensional in that there are a few different avenues to reach the same goal. Ultimately it would be ideal to examine the strength of this grid system given a few different factors such as grid size, domain name length, amount of password/domain pairs, etc. Also, attempting different algorithms and environments for improved code runtime would be interesting. 

In this particular case, my goal will be to first find a working algorithm to generate a valid Latin Square of arbitrary grid size. Second, I will need to be able to generate the password "hash" from the given table. Finally, though this is a large jump, I can look into what information is afforded by having those tiny pieces of information about the original grid and consider the viability of piecing it back together. My predilection is that the branching factor is too high to have much luck with a search algorithm but we shall see.

For me, round one involved a rather iterative approach involving a straightforward loop choosing whatever random characters were available for a cell. This quickly hit a wall as the particular row wouldn't always agree with the column for a cell. 

At this point I hope to apply backtracking to finish that aspect of the project. I'll be starting with Java and moving from there.