#archiveteam-bs 2014-03-08,Sat

↑back Search

Time Nickname Message
00:07 🔗 godane so i found a full series of a podcast on CNBC
00:07 🔗 godane presented by BP
02:02 🔗 godane looks like cnbc uses Subversion
02:10 🔗 balrog godane: hmm where?
02:29 🔗 godane https://qa.register.cnbc.com/images/opt/.svn/
02:29 🔗 godane http://qa.register.cnbc.com/partners/.svn/
02:29 🔗 godane http://qa.register.cnbc.com/js/.svn/
02:30 🔗 godane http://qa.register.cnbc.com/xml/.svn/
02:37 🔗 xmc ahaha
02:43 🔗 godane so i got some full series from CNBC site
02:43 🔗 godane not just Mad Money
02:43 🔗 godane but On The Money
02:44 🔗 godane and Fast Money
02:44 🔗 godane Opitions Actions
02:44 🔗 godane *action
02:45 🔗 godane Business Of Innovation
02:46 🔗 godane looks like Business of Innovation was presented by IBM
02:56 🔗 godane so looks like when i search for 2008-04 here i get one from may 1, 2008 for some reason: https://archive.org/search.php?query=collection%3Awsjtecham%20date%3A2008-04&sort=-publicdate
03:31 🔗 Aranje if this isn't already an IA project, I figure anything in here should be automatically archived http://perma.cc/
03:33 🔗 Aranje oh, excellent. hiding in their about page is that IA is a partner
03:33 🔗 Aranje I can only hope this means auto-crawl or similar
04:15 🔗 godane so i found out that the video podcasts of cnbc has stopped around 09/30/2013
06:52 🔗 DFJustin Every link stored by Perma.cc results in an archive file, using the industry-standard WARC format. Perma.cc will replicate these archives to available third party services. For example, we plan to automatically submit archives to the Internet Archive. Most Permalinks will therefore have a corresponding address at one or more third-party servers.
06:54 🔗 xmc "plan to"
06:54 🔗 xmc at least they're using warc!
06:58 🔗 DFJustin it's only been around for a few months by the looks of it
07:18 🔗 godane i'm brute forcing a grab of support.gateway.com for pdf manuals
07:18 🔗 godane turns out alot of them use numbers: https://support.gateway.com/s/Manuals/Mobile/8511892.pdf
11:10 🔗 joepie91_ woohoo!
11:10 🔗 joepie91_ disc repair thingie arrived
11:17 🔗 joepie91_ not the highest build quality, but solid
13:20 🔗 nico xmc: now, they just need to make sure that their warc
13:20 🔗 nico will not make IA's deriver go wild...
13:41 🔗 SadDM nico: has that been a problem?
13:41 🔗 SadDM does wget sometimes make bad warcs?
13:52 🔗 nico it can happen
13:52 🔗 nico seen with the ArchiveBot
13:52 🔗 nico wget quit but the file is corrupted
14:00 🔗 SadDM huh... so I guess the best practice is to run a validator before uploading?
14:02 🔗 SadDM OK, that's a pretty simple thing to do... I'll just add that to my workflow.
14:25 🔗 nico it occurs when you mirror big site7
14:26 🔗 SadDM hmm... good tip. Running the validator turns out to be so simple that it's almost irresponsible to not do it. :-)
14:34 🔗 nico what do you use?
14:35 🔗 nico i have some .warc.gz that i believe are corrupt
14:35 🔗 nico that i want to check
14:39 🔗 SadDM hmm, let me double-check... yes: https://github.com/internetarchive/warctools/
14:45 🔗 nico thanks you
16:33 🔗 nico /dev/mapper/vg_raid5-lv_raid5 12T 32M 12T 1% /mnt/data2
16:33 🔗 nico adding capacity is fun but expensive :)
17:07 🔗 midas storage is good nico ;-)
17:10 🔗 * nico is backporting transmission to debian wheezy
17:10 🔗 nico wtf is the dependency to qt5-qmake
21:52 🔗 dugdig poor underscor
21:53 🔗 xmc "just because it's plug-compatible doesn't mean you have to keep testing it!"
22:57 🔗 underscor there we go, finally got this box stable
23:17 🔗 godane know that the first 19 issues of 1941 Sovboda Newspaper is not on there site for some reason
23:35 🔗 godane so i'm about a 1/3 there with svoboda_newspaper
23:36 🔗 godane i'm not going to be able to get anything from 2014 cause i think thats for subscribers

irclogger-viewer