Time |
Nickname |
Message |
00:07
🔗
|
godane |
so i found a full series of a podcast on CNBC |
00:07
🔗
|
godane |
presented by BP |
02:02
🔗
|
godane |
looks like cnbc uses Subversion |
02:10
🔗
|
balrog |
godane: hmm where? |
02:29
🔗
|
godane |
https://qa.register.cnbc.com/images/opt/.svn/ |
02:29
🔗
|
godane |
http://qa.register.cnbc.com/partners/.svn/ |
02:29
🔗
|
godane |
http://qa.register.cnbc.com/js/.svn/ |
02:30
🔗
|
godane |
http://qa.register.cnbc.com/xml/.svn/ |
02:37
🔗
|
xmc |
ahaha |
02:43
🔗
|
godane |
so i got some full series from CNBC site |
02:43
🔗
|
godane |
not just Mad Money |
02:43
🔗
|
godane |
but On The Money |
02:44
🔗
|
godane |
and Fast Money |
02:44
🔗
|
godane |
Opitions Actions |
02:44
🔗
|
godane |
*action |
02:45
🔗
|
godane |
Business Of Innovation |
02:46
🔗
|
godane |
looks like Business of Innovation was presented by IBM |
02:56
🔗
|
godane |
so looks like when i search for 2008-04 here i get one from may 1, 2008 for some reason: https://archive.org/search.php?query=collection%3Awsjtecham%20date%3A2008-04&sort=-publicdate |
03:31
🔗
|
Aranje |
if this isn't already an IA project, I figure anything in here should be automatically archived http://perma.cc/ |
03:33
🔗
|
Aranje |
oh, excellent. hiding in their about page is that IA is a partner |
03:33
🔗
|
Aranje |
I can only hope this means auto-crawl or similar |
04:15
🔗
|
godane |
so i found out that the video podcasts of cnbc has stopped around 09/30/2013 |
06:52
🔗
|
DFJustin |
Every link stored by Perma.cc results in an archive file, using the industry-standard WARC format. Perma.cc will replicate these archives to available third party services. For example, we plan to automatically submit archives to the Internet Archive. Most Permalinks will therefore have a corresponding address at one or more third-party servers. |
06:54
🔗
|
xmc |
"plan to" |
06:54
🔗
|
xmc |
at least they're using warc! |
06:58
🔗
|
DFJustin |
it's only been around for a few months by the looks of it |
07:18
🔗
|
godane |
i'm brute forcing a grab of support.gateway.com for pdf manuals |
07:18
🔗
|
godane |
turns out alot of them use numbers: https://support.gateway.com/s/Manuals/Mobile/8511892.pdf |
11:10
🔗
|
joepie91_ |
woohoo! |
11:10
🔗
|
joepie91_ |
disc repair thingie arrived |
11:17
🔗
|
joepie91_ |
not the highest build quality, but solid |
13:20
🔗
|
nico |
xmc: now, they just need to make sure that their warc |
13:20
🔗
|
nico |
will not make IA's deriver go wild... |
13:41
🔗
|
SadDM |
nico: has that been a problem? |
13:41
🔗
|
SadDM |
does wget sometimes make bad warcs? |
13:52
🔗
|
nico |
it can happen |
13:52
🔗
|
nico |
seen with the ArchiveBot |
13:52
🔗
|
nico |
wget quit but the file is corrupted |
14:00
🔗
|
SadDM |
huh... so I guess the best practice is to run a validator before uploading? |
14:02
🔗
|
SadDM |
OK, that's a pretty simple thing to do... I'll just add that to my workflow. |
14:25
🔗
|
nico |
it occurs when you mirror big site7 |
14:26
🔗
|
SadDM |
hmm... good tip. Running the validator turns out to be so simple that it's almost irresponsible to not do it. :-) |
14:34
🔗
|
nico |
what do you use? |
14:35
🔗
|
nico |
i have some .warc.gz that i believe are corrupt |
14:35
🔗
|
nico |
that i want to check |
14:39
🔗
|
SadDM |
hmm, let me double-check... yes: https://github.com/internetarchive/warctools/ |
14:45
🔗
|
nico |
thanks you |
16:33
🔗
|
nico |
/dev/mapper/vg_raid5-lv_raid5 12T 32M 12T 1% /mnt/data2 |
16:33
🔗
|
nico |
adding capacity is fun but expensive :) |
17:07
🔗
|
midas |
storage is good nico ;-) |
17:10
🔗
|
* |
nico is backporting transmission to debian wheezy |
17:10
🔗
|
nico |
wtf is the dependency to qt5-qmake |
21:52
🔗
|
dugdig |
poor underscor |
21:53
🔗
|
xmc |
"just because it's plug-compatible doesn't mean you have to keep testing it!" |
22:57
🔗
|
underscor |
there we go, finally got this box stable |
23:17
🔗
|
godane |
know that the first 19 issues of 1941 Sovboda Newspaper is not on there site for some reason |
23:35
🔗
|
godane |
so i'm about a 1/3 there with svoboda_newspaper |
23:36
🔗
|
godane |
i'm not going to be able to get anything from 2014 cause i think thats for subscribers |