Don’t believe the client

You have to disbelieve the client. When a mid-sized client tells you “this is the solution, when can we do it?” it’s not the same as a large corporate client telling you that they’ve selected a software package and now need implementation advice. The latter have both the expertise and experience to have made a rational decision and you can feel safe moving forward. The former has probably fallen for somebody’s sales pitch, or is engaged in what one of my mentors called “management by magazine.”

Read More

National Transportation Data Challenge @ JupyterCon

I was asked to join the team running the National Transportation Data Challenge, not as a data scientist, but as a project manager to help keep all the big data people on track and moving forward. It’s an interesting use of my skillset and after a slow start I’ve been devoting more and more of my time to it. After lots of work behind the scenes, we showcased the Challenge to a more general public last week at JupyterCon in New York.

Read More

Retail: A tale of two charts

… a more careful understanding of the data might suggest that Amazon is a lot more important than it may seem at first glance, and that the lack of benefit from “savings” in other areas can easily be understood when the data is explored more thoroughly. As a data scientist, that’s the kind of thing I need to be able to do in order to tell a story that is not only compelling, but accurate.

Read More

Udacity Deep Learning Nanodegree — Part 3

I found TensorFlow initially confusing but then quite comfortable. It’s odd how after programming in a language like Python for a while, it becomes confusing that you have to declare “placeholders” (variables) and constants up-front, then initialize them.

Read More

It’s not just a problem “over there.”

The American Red Cross is running computers with an obsolete and unsupported operating system, and using them to collect HIPAA protected personal health and other information! If a Russian or North Korean hacker can break your system because you’re using 16 year-old software that hasn’t been supported for two years, then you as an organization have failed.

Read More

Udacity Deep Learning Nanodegree – Part II

I left last week’s PyData Meetup with more questions than answers. Questions like “why does that neural net I just wrote perform the way it does?” So, with a couple of weeks left until the next project is due, I decided to go back and revisit the second half of the neural networks topic before moving forward.

Read More

PyData Socal: Explaining Black Box ML Predictions

Tonight I joined the first Southern California PyData meetup. It featured two speakers discussing how to better understand the predictions made by machine-learning models, and why it might be important to do so. I was impressed by the capabilities of the packages demonstrated and the likely importance of having such capabilities as we move forward with deep learning-based automation that could cause catastrophic results if it fails in unexpected ways.

Read More