Feb 08 2009

“The more labels you have for yourself, the dumber they make you.”

Tag: UncategorizedDan @ 2:36 pm

The title is from Paul Graham’s most recent essay: Keeping your identity small.


Jan 27 2009

Finding common pairs using SQL

Tag: UncategorizedDan @ 8:34 pm

I think a good technical article is well over due. I work in a merge center for a Fortune 50 hardware manufacturer, where products from various places are brought together and consolidated into a single order (well, as few shipments as possible) and then shipped on to the customer. As you might guess there are a number of parts “picked” to fulfill an order.

Last week my co-workers were working to reduce travel time to pick certain types of products. They had taken the obvious step of pulling the shipped quantities of parts and doing a pareto analysis to identify the most high volume parts (obviously to place closer to the conveyor). But then the decided to see if they could identify parts that are frequently picked together. That seemed a little trickier, so they called me. Here’s how I did it:

First I pulled the raw data from our Data Warehouse (obviously this step would vary from company to company).  The format was approximately as follows (disguised to protect my employer and, therefore, myself).

Identifier Pick part Quantity
1234 ABC 1
1234 DEF 1
5678 FGH 1
9012 ABC 1
5644 XYZ 1
5644 DEF 1

Ad nauseum.

So how do we tackle this? The trick is that we need to do a self-join of the table to pair up items by identifier. What’s going to happen then is we will conceptually explode the table to show every combination (this is called a Cartesian Product) for each tag, then we will use an aggregate function to collapse it back down and sort by the number of occurences; remember though, an occurence is an identifier that has those tags, not the parts. I implemented this in MS Access because that’s what my peers have access to (official policy is to not have SQL Server 2008 installed on your laptop… Don’t look at me like that, I would never do such a thing… *cough*) Here’s a sketch of the code:

select a.[part picked], b.[part picked], count(*) as Occurences from raw_data as A, raw_data as b
where
a.identifier = b.identifier and
a.part < b.part
group by a.[part picked], b.[part picked]
order by count(*) desc

A few last notes: I don’t use Access’s GUI for doing queries, so I have no idea how you’d do that through the dialogue box. Just hit “SQL View” and type it in. The “a.part < b.part” is because we don’t want to count part combinations twice (so we are enforcing an ordering of the pairs).  Also it should go without saying but a key idea in all of this programming and analysis stuff is don’t think of the specific problem, solve the problem in the abstract! There’s a reason I called this post “common pairs,” this is not about how to find items that are picked together. You could use this to identify web visitors with common IP addresses (IP address would be the identifier and users would be “parts”), commonly purchased items among customers (identifier is customer, items are “parts”), etc etc. Try to figure out the generic problem that is being solved before you focus on specifics.

Thanks, let me know what you think.


Jan 01 2009

Upgraded Wordpress

Tag: UncategorizedDan @ 1:45 pm

The Mystic's HammerI guess I may as well start the new year with shiny new software, right?


Jan 01 2009

Happy New Year!

Tag: UncategorizedDan @ 10:56 am

Haven’t updated in awhile. Finally finished up the degree and now resuming "normal life." Here’s looking forward to a great 2009


Jun 27 2008

Neural input device

Tag: newsDan @ 5:04 pm

There is a review of a really cool device over at Tech Report. Unfortunately the review and comments are pretty stupid. My first thought is that this is fantastic. Here is a device for disabled people to be able to play games, etc.

Here at UNR we have a professor in the Computer Science department, Eelke Folmer, who does research in accesssible gaming stuff — very cool!


Jun 19 2008

Icahn report goes live

Tag: newsDan @ 6:03 pm

So after many months I noticed Carl Icahn’s blog has gone live with several interesting article’s about, as you might guess, the incompetency of CEOs and the failings of corporate governance. Interesting reading.


Jun 17 2008

Demand for engineers grows with data centers

Tag: techDan @ 5:45 am

Interesting story at the New York Times today. I thought it would be an article about the need for engineering expertise in data mining, etc. But it turns out to be an article about the growing demand for mechanical and environmental engineers to build the physical infrastructure that our virtual infrastructure runs on.

They feature a quote from Dr. Jon Koomey, who is the author of one of my top three books (probably #1) for data analysis: Turning Numbers into Knowledge.


Jun 16 2008

Optimization tree

Tag: UncategorizedDan @ 8:53 pm

Not optimal traversal of trees; rather a tree that shows sketches the field of optimization.

Browsing the web I came across the NEOS Optimization Tree at Argonne National Labs. It hasn’t been updated since I wrote my first web page, but the graphic is a nice overview of the different areas (and let’s be honest, a lot of that math was figured out before the web even came around).


Dec 11 2007

Expertise

Tag: PersonalDan @ 1:23 pm

From the medical journal Anesthesiology:

Expertise is more than simply having extensive factual knowledge or competent skills. Experts have specific psychological attributes such as self-confidence, excellent communication skills, adaptability, and risk tolerance. They also have specific skills, including highly developed attention to what is relevant, ability to identify exceptions to the rules, flexibility to changing situations, effective performance under stress, and ability to make decisions and initiate actions quickly based on incomplete data.