[00:07] so i found a full series of a podcast on CNBC [00:07] presented by BP [02:02] looks like cnbc uses Subversion [02:10] godane: hmm where? [02:29] https://qa.register.cnbc.com/images/opt/.svn/ [02:29] http://qa.register.cnbc.com/partners/.svn/ [02:29] http://qa.register.cnbc.com/js/.svn/ [02:30] http://qa.register.cnbc.com/xml/.svn/ [02:37] ahaha [02:43] so i got some full series from CNBC site [02:43] not just Mad Money [02:43] but On The Money [02:44] and Fast Money [02:44] Opitions Actions [02:44] *action [02:45] Business Of Innovation [02:46] looks like Business of Innovation was presented by IBM [02:56] so looks like when i search for 2008-04 here i get one from may 1, 2008 for some reason: https://archive.org/search.php?query=collection%3Awsjtecham%20date%3A2008-04&sort=-publicdate [03:31] if this isn't already an IA project, I figure anything in here should be automatically archived http://perma.cc/ [03:33] oh, excellent. hiding in their about page is that IA is a partner [03:33] I can only hope this means auto-crawl or similar [04:15] so i found out that the video podcasts of cnbc has stopped around 09/30/2013 [06:52] Every link stored by Perma.cc results in an archive file, using the industry-standard WARC format. Perma.cc will replicate these archives to available third party services. For example, we plan to automatically submit archives to the Internet Archive. Most Permalinks will therefore have a corresponding address at one or more third-party servers. [06:54] "plan to" [06:54] at least they're using warc! [06:58] it's only been around for a few months by the looks of it [07:18] i'm brute forcing a grab of support.gateway.com for pdf manuals [07:18] turns out alot of them use numbers: https://support.gateway.com/s/Manuals/Mobile/8511892.pdf [11:10] woohoo! [11:10] disc repair thingie arrived [11:17] not the highest build quality, but solid [13:20] xmc: now, they just need to make sure that their warc [13:20] will not make IA's deriver go wild... [13:41] nico: has that been a problem? [13:41] does wget sometimes make bad warcs? [13:52] it can happen [13:52] seen with the ArchiveBot [13:52] wget quit but the file is corrupted [14:00] huh... so I guess the best practice is to run a validator before uploading? [14:02] OK, that's a pretty simple thing to do... I'll just add that to my workflow. [14:25] it occurs when you mirror big site7 [14:26] hmm... good tip. Running the validator turns out to be so simple that it's almost irresponsible to not do it. :-) [14:34] what do you use? [14:35] i have some .warc.gz that i believe are corrupt [14:35] that i want to check [14:39] hmm, let me double-check... yes: https://github.com/internetarchive/warctools/ [14:45] thanks you [16:33] /dev/mapper/vg_raid5-lv_raid5 12T 32M 12T 1% /mnt/data2 [16:33] adding capacity is fun but expensive :) [17:07] storage is good nico ;-) [17:10] * nico is backporting transmission to debian wheezy [17:10] wtf is the dependency to qt5-qmake [21:52] poor underscor [21:53] "just because it's plug-compatible doesn't mean you have to keep testing it!" [22:57] there we go, finally got this box stable [23:17] know that the first 19 issues of 1941 Sovboda Newspaper is not on there site for some reason [23:35] so i'm about a 1/3 there with svoboda_newspaper [23:36] i'm not going to be able to get anything from 2014 cause i think thats for subscribers