So you’ve found your headings and data, identified patterns and fixed your anomalies (Seriously, it’s crucial you check out part 1 of this series). Time to start coding!
1 - Work out an approach
Chances are the data won’t be arranged in a way that makes it easily extractable (that is why we’re doing this after all) so you’ll have to get a little creative here. Referencing the notes you made from Part 1 (you did take notes like I suggested, right?) figure out a way to extract the data you need. By analysing your column headings, data, patterns and anomalies you’ll be able to come up with a basic algorithm that will capture the absolute basic scope of data you need. Don’t worry too much about getting this completely right just yet.
2 - Don’t try to bite off more than you can chew
This is something that we as programmers regular fall victim to. Don’t try to solve the entire problem in one go. Identify a small problem within the bigger problem. Find a solution for the small problem. Apply the solution. Move on to the next small problem within the bigger problem.
Focus on accurately parsing the headers. Then focus on accurately parsing the data for each header. And, finally, focus on cleaning up whatever remaining anomalies you identified in your earlier analysis.
3 - Final Checks
Almost there. Double check that the results of your script are accurate. Remember this data is more than likely pretty messy so look out for and where necessary include checks to catch unexpected data types where they shouldn’t be (numbers stored as text for example). Also where possible include considerations for converting and storing more complicated data types as they were intended (ie. convert dates to date objects before storing them)
By now you should have all your data nice and tidy. Go have some fun with it!