[00:06] *** londoncal has quit IRC (Remote host closed the connection) [00:11] caber: the archivebot crawl will be part of a larger pack of uploads, so it won't have individual meta information [00:11] ok [00:11] caber: see eg. https://archive.org/details/archiveteam_archivebot_go_20150312050003 [00:12] ok [00:12] *** Ymgve has quit IRC () [00:13] who archived archive.org? :_) [00:14] (with other words, what happens if their building burns down, and they get some nasty skynet-like virus at the same time? do they have off-site tape?) [00:14] caber: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK [00:14] thanks balrog [00:15] caber: they have two locations and there's four copies of everything iirc [00:15] but it's still not foolproof [00:15] balrog: of course that is a page on the wiki [00:15] hah [00:20] yeah, it are mainly proposal [00:20] but - if the internet archive burns, we're doomed? [00:20] so to speak [00:20] *** www2 has quit IRC (Ping timeout: 306 seconds) [00:21] well, the internet archive /has/ burned, sort of [00:21] http://www.theverge.com/2013/11/7/5076166/the-internet-archive-seeks-donations-after-fire-destroys-equipment [00:21] *** rose has joined #archiveteam [00:25] balrog: added a note here: http://archiveteam.org/index.php?title=Talk:INTERNETARCHIVE.BAK#Other_anticipated_problems [00:26] re: bad actor prevention [00:26] I'd say it at least raises the bar for bad actors to the point of "unlikely anybody will bother", assuming the implementation works as designed [00:31] *** mistym has quit IRC (Remote host closed the connection) [00:32] *** cbb2 has joined #archiveteam [00:34] *** TheFifthH has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 36.0.1/20150305021524]) [00:35] *** cbb has quit IRC (Read error: Operation timed out) [00:37] *** Start has quit IRC (Disconnected.) [00:47] *** Start has joined #archiveteam [01:05] *** mistym has joined #archiveteam [01:10] *** rose has quit IRC (Leaving) [01:27] *** Start has quit IRC (Disconnected.) [01:42] *** Start has joined #archiveteam [01:49] *** arkhive has joined #archiveteam [01:50] I have an idea on an archiving project. Well, something like a project. [01:50] *** BlueMaxim has quit IRC (Read error: Operation timed out) [01:50] *** BlueMaxim has joined #archiveteam [01:50] oops i meant to do that in main AT IRC [01:53] you did [01:53] oops. oh fck [01:53] lol i just read the end of the title on window of mIRC and it said -bs [01:54] I'll copy and paste, if alriihhgt [01:54] I haven't been on AT/active for a few months. been busy, yo [01:54] Amazon has their Prime Instant Video service and they have a thing called 'Pilot Season' Right now it is like the fourth 'Pilot Season' In each season pilot episodes for original content exclusive to Amazon Prime, series are pitched. Some picked up, some are not. Now... most of the 'unsold pilots' from Amazon's Pilot Season are unavailable on their website. [01:54] I think we should get all of the ones that were not sold/not picked up as a series. and keep them before they disappear. I haven't done too much looking into various bit torrent sites to see if I can find all of them. but i know Amazon does not host a lot of them anymore [01:55] Anyway, I think it would be good to gather the unsold/not picked up pilots together to put somewhere. I was unable to find a lot of them. Some are actually pretty cool(the ones that were not picked up, that is) [01:56] here is the link to the wikipedia page(hopefully accurate enough [01:56] http://en.wikipedia.org/wiki/List_of_original_programs_distributed_by_Amazon#Pilots [01:57] What do ya'll think? Apologies for lots of posts :P Let me know :) [01:57] where would you find them? [01:58] Not sure. I was thinking ask arround on private torrent sites, like make requests(I have never made a request so idk if that'd work) or loook at various public/private torrent sites, search, read IMDB message boards/post on there asking for help. [01:58] ask around. maybe some AT memembers have some? [01:59] * winr4r doesn't, rarely downloads much [01:59] ah [01:59] it's an interesting idea though, probably have to be a dark collection though [01:59] same. I have the fastest connection offered in my area and it is 12Mbit/s down 896Kbit/s up lol [01:59] what does that mean? [02:00] meaning one that is in a safe place, but not publically accessible [02:00] after i snatch a copy and watch 'em all :) lol [02:00] i recently went from 8/.384 to 38/15 [02:00] feels good man [02:01] ya. i want gigabit lol. but that'll be a long while till my area gets it [02:01] Can we post the idea in the project idea section of AT wiki? [02:08] where? [02:09] either there's not a "throwing out ideas" page and there should be, or there is one and i can't find it [02:09] i thought there was one. but i am totally out of the loop so who knows [02:10] i am a little bit out of the loop too [02:10] *** primus104 has quit IRC (Leaving.) [02:12] i'm sure i've seen a page just like that, i just don't know where [02:13] http://www.archiveteam.org/index.php?title=Fire_Drill [02:13] thank you, gayrh [02:14] garyrh [02:24] *** cbb2 has quit IRC (Quit: cbb2) [02:31] *** cloudyys has joined #archiveteam [02:57] *** cloudcake has joined #archiveteam [02:58] LayerVault, often described as "version control for designers", is shutting down on April 11, 2015. [02:59] Unless we get enough items from Bing, Twitter, and URLTeam, looks like we'll have to do discovery. [02:59] *** cloudyys has quit IRC (Read error: Operation timed out) [03:00] Items are 10 characters, mixed-case alphanumeric. (Example: https://layervault.com/rev/lbG9ItfGiW) [03:00] sequential? [03:00] Doesn't look like it. [03:01] then forget about brute forcing them [03:02] We'd be best off randomly trying different URLs. [03:02] There's also news.layervault.com [03:03] Which should be easier, everything is numeric and sequential: https://news.layervault.com/stories/10831 [03:05] I suggest #layersalt [03:05] news.layervault can be shoved into archivebot [03:06] probably [03:06] I've already put it in [03:07] \o/ [03:08] It has 45,940 stories and 21,932 users [03:09] *** dashcloud has quit IRC (Read error: Operation timed out) [03:12] *** dashcloud has joined #archiveteam [03:20] *** mistym has quit IRC (Remote host closed the connection) [03:22] hi [03:22] can we start #froogle [03:26] wasn't that once the name of a google service [03:27] Did nobody put pjsmprints on archivebot again? It's feeling much healthier now in terms of latency, the last backup effort was partial because things got redirected to localhost. [03:32] and start finding every google service that shares piblic cultural items. [03:32] sorry irc on phone on plane. [03:41] *** godane has quit IRC (Leaving.) [03:41] so do you mean something like University of Wherever hosts everything on Google Blah, let's back it up? [03:42] *** mistym has joined #archiveteam [03:52] No. [03:53] Imean like blogger, google code, etc. [03:53] Google images,google maps [03:53] Just anything that takes user data. Just to get a sense. [03:53] Oh, I see. [03:56] Sort of a Google census. [04:21] *** cloudcake has quit IRC (Leaving) [04:30] *** lazlonibb has joined #archiveteam [04:31] *** lazlonibb has left [04:33] hmm is there any possibility of publicly shaming aol for killing their 20-year-old file libraries, the discussion in #aohell is kind of depressing (we can still download file descriptions, but not files) [04:46] *** johtso has quit IRC (Quit: Connection closed for inactivity) [05:06] *** wp494_ has joined #archiveteam [05:17] *** wp494 has quit IRC (Ping timeout: 740 seconds) [05:38] *** wp494_ has quit IRC (Remote host closed the connection) [05:38] *** wp494 has joined #archiveteam [06:44] *** rejon has quit IRC (Ping timeout: 512 seconds) [06:44] SketchCow, here's a start: http://www.archiveteam.org/index.php?title=Froogle [06:51] *** mst_ has joined #archiveteam [06:56] *** rejon has joined #archiveteam [06:58] *** db48x has joined #archiveteam [07:00] *** X-Scale has joined #archiveteam [07:06] *** mistym has quit IRC (Remote host closed the connection) [07:13] *** mst_ has quit IRC (Quit: bye) [07:16] *** db48x has quit IRC (Read error: Operation timed out) [07:23] *** signius has quit IRC (Read error: Operation timed out) [07:31] *** londoncal has joined #archiveteam [07:35] *** techapj has joined #archiveteam [07:35] *** dashcloud has quit IRC (Read error: Operation timed out) [07:35] *** signius has joined #archiveteam [07:38] *** dashcloud has joined #archiveteam [07:54] *** dashcloud has quit IRC (Read error: Operation timed out) [08:01] *** dashcloud has joined #archiveteam [08:11] for n in $(seq 1 1600); do wget --content-disposition -c http://felixonline.co.uk/issuearchive/issue/$n/download/; done [08:16] *** londoncal has quit IRC (Quit: Leaving...) [08:25] *** schbirid has joined #archiveteam [08:29] *** primus104 has joined #archiveteam [08:55] http://google-opensource.blogspot.co.uk/2015/03/farewell-to-google-code.html?showComment=1426180063486#c3759045954192192386 [08:55] "It's like watching Geocities go away." [09:07] *** rolfb has joined #archiveteam [09:24] *** techapj has quit IRC () [09:26] *** RuairiCOL has quit IRC () [09:28] *** primus104 has quit IRC (Leaving.) [09:34] *** d5af1e30 has joined #archiveteam [09:42] *** SadDM has quit IRC (Ping timeout: 370 seconds) [09:47] *** SadDM has joined #archiveteam [09:47] *** swebb sets mode: +o SadDM [09:55] *** SadDM_ has joined #archiveteam [09:55] *** swebb sets mode: +o SadDM_ [09:55] *** SadDM has quit IRC (Ping timeout: 370 seconds) [10:05] *** Ymgve has joined #archiveteam [10:10] *** SadDM_ has quit IRC (Ping timeout: 370 seconds) [10:16] *** SadDM has joined #archiveteam [10:16] *** swebb sets mode: +o SadDM [10:23] *** SadDM has quit IRC (Ping timeout: 370 seconds) [10:24] *** rolfb has quit IRC (Leaving...) [10:25] *** johtso has joined #archiveteam [10:29] *** SadDM has joined #archiveteam [10:29] *** swebb sets mode: +o SadDM [10:36] *** SadDM has quit IRC (Read error: Connection reset by peer) [10:36] *** SadDM has joined #archiveteam [10:36] *** swebb sets mode: +o SadDM [10:43] *** SadDM has quit IRC (Ping timeout: 370 seconds) [10:52] *** ohhdemgir has quit IRC (Quit: Leaving) [10:54] *** SadDM has joined #archiveteam [10:54] *** swebb sets mode: +o SadDM [11:06] *** SadDM has quit IRC (Ping timeout: 370 seconds) [11:07] *** rejon has quit IRC (Ping timeout: 512 seconds) [11:11] *** SadDM has joined #archiveteam [11:11] *** swebb sets mode: +o SadDM [11:15] *** rejon has joined #archiveteam [11:20] *** SadDM has quit IRC (Ping timeout: 370 seconds) [11:23] *** ohhdemgir has joined #archiveteam [11:25] *** Nertsy has quit IRC (Quit: Nertsy) [11:25] *** Nertsy has joined #archiveteam [11:37] *** SadDM has joined #archiveteam [11:37] *** swebb sets mode: +o SadDM [11:47] *** BlueMaxim has quit IRC (Quit: Leaving) [11:53] *** SadDM has quit IRC (Ping timeout: 370 seconds) [12:03] *** SadDM has joined #archiveteam [12:03] *** swebb sets mode: +o SadDM [12:44] *** sankin has joined #archiveteam [13:52] *** rejon has quit IRC (Ping timeout: 512 seconds) [14:01] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [14:01] *** Froggypwn has joined #archiveteam [14:04] *** rejon has joined #archiveteam [14:15] *** d5af1e30 has quit IRC (Read error: Connection reset by peer) [14:16] I was reading trough the 'archive the Internet Archive' page on the wiki [14:18] *** Start has quit IRC (Disconnected.) [14:18] I like the though of checking cryptographically or someone really has file X or Y - without them being able to pre-calculate all possible challenges [14:18] while having an open design [14:26] *** VADemon has joined #archiveteam [14:39] *** primus104 has joined #archiveteam [14:44] You might want to join #internetarchive.bak [14:44] Which is where discussions on the 'archive IA' thing takes place. [14:44] caber: ^ [14:48] *** dshr has joined #archiveteam [14:56] *** mistym has joined #archiveteam [14:57] *** mistym has quit IRC (Remote host closed the connection) [15:03] *** Start has joined #archiveteam [15:05] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc [15:05] oh look, google's killing another product [15:05] freebase shutting down mid-2015 [15:06] irc channel ideas? [15:06] uh, they're going to publish a dump [15:06] so the best channel name would maybe be #archiveteam [15:10] *** db48x has joined #archiveteam [15:14] oh, missed that [15:19] *** mistym has joined #archiveteam [15:25] *** dashcloud has quit IRC (Read error: Connection reset by peer) [15:27] *** dashcloud has joined #archiveteam [15:43] Yahoo #2 is rising [15:46] *** mistym has quit IRC (Remote host closed the connection) [15:51] *** Start has quit IRC (Disconnected.) [15:56] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [15:57] *** Start has joined #archiveteam [15:58] *** Froggypwn has joined #archiveteam [15:58] *** Start has quit IRC (Read error: Connection reset by peer) [15:58] *** Start has joined #archiveteam [16:02] *** mistym has joined #archiveteam [16:03] *** SadDM has quit IRC (Ping timeout: 370 seconds) [16:03] *** SadDM has joined #archiveteam [16:03] *** swebb sets mode: +o SadDM [16:15] *** godane has joined #archiveteam [16:31] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [16:32] arkiver: when will trovebox and rapidshare discovery start? [16:32] *** dashcloud has joined #archiveteam [16:35] http://superwaitlist.com/google-deprecates-old-webmaster-tools-api/ [16:35] google's on a kill spree this week [16:41] *** Thynix has quit IRC (Ping timeout: 186 seconds) [16:45] *** Start has quit IRC (Disconnected.) [16:49] *** primus104 has quit IRC (Leaving.) [16:53] *** Start has joined #archiveteam [17:13] *** chazchaz_ has joined #archiveteam [17:22] i'm wondering if we should do something like #froogle for yahoo [17:23] maybe #woohoo ? [17:33] yanoo [17:34] *** Start_ has joined #archiveteam [17:34] *** Start has quit IRC (Read error: Connection reset by peer) [17:35] *** Start_ is now known as Start [17:38] #ohnoo also would work [17:39] i'll just use woohoo for now [17:39] http://archiveteam.org/index.php?title=Woohoo [17:43] *** Start has quit IRC (Disconnected.) [17:59] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [18:00] *** habi has joined #archiveteam [18:01] *** dashcloud has joined #archiveteam [18:30] *** ionpulse has quit IRC (Ping timeout: 506 seconds) [18:34] *** habi has left [18:42] *** Start has joined #archiveteam [18:44] *** Start_ has joined #archiveteam [18:44] *** Start has quit IRC (Read error: Connection reset by peer) [18:46] could anyone with a good chunk of space archive this? http://demos.igmdb.org/ [18:46] best check the sizes first, iirc challenge-tv should be ~70G alone [18:46] please tell me if you take this project [18:55] *** Emcy has quit IRC (Ping timeout: 362 seconds) [18:56] *** Emcy has joined #archiveteam [19:02] *** Start_ has quit IRC (Disconnected.) [19:17] *** Start has joined #archiveteam [19:27] *** Start has quit IRC (Disconnected.) [19:32] *** SN4T14 has joined #archiveteam [19:32] *** Start has joined #archiveteam [19:45] schbirid, I would like to try, I have roughly 800 + 1000GB space [19:46] more than enough :) [19:46] i think [19:46] *** SN4T14_ has joined #archiveteam [19:46] this will be my first project then :) [19:46] excellent! [19:48] "wget -m -a demos.igmdb.org_$(date +%Y%m%d).log" should be all that's needed [19:49] *** dashcloud has quit IRC (Read error: Connection reset by peer) [19:49] what about -r for recursive? I've used wget only a few times [19:50] *** SN4T14 has quit IRC (Ping timeout: 306 seconds) [19:51] VADemon: iirc, -m implies -r [19:51] -m === -r -N -l inf --no-remove-listing [19:51] yeah [19:51] *** dashcloud has joined #archiveteam [19:52] give me some time guys :) [19:52] theoretically for a site like this one would not want the html indexes but hey, they don't hurt either and keeping them is easier [19:55] *** primus104 has joined #archiveteam [20:05] *** mistym_ has joined #archiveteam [20:10] *** mistym has quit IRC (Read error: Operation timed out) [20:14] wget.exe --directory-prefix="Y:\archive_demos.igmdb.org\" --append-output "demos.igmdb.org_$(date +%Y%m%d).log" -m http://demos.igmdb.org/ [20:14] *** BlueMaxim has joined #archiveteam [20:14] gives me Scheme missing error [20:16] sorry, that $(date) thing is linux only [20:17] would result in "20150313" [20:17] i just like timestamps ;9 [20:17] *** Start has quit IRC (Disconnected.) [20:18] I thought I would still work so didnt even ask, just worked for me under ubuntu (vm) [20:18] *** godane has quit IRC (Quit: Leaving.) [20:21] ok it's working, only downloading index.htmls atm [20:25] it works, thanks for helping with the setup [20:35] *** BlueMaxim has quit IRC (Ping timeout: 512 seconds) [20:36] *** BlueMaxim has joined #archiveteam [20:43] yay [20:53] *** sankin has quit IRC (Leaving.) [20:54] *** lag2 has joined #archiveteam [21:29] *** godane has joined #archiveteam [21:30] *** db48x` has joined #archiveteam [21:45] *** dshr has quit IRC (Quit: Page closed) [21:48] *** philpem has joined #archiveteam [21:49] VADemon: Dude, don't run it on windows [21:49] Jesus god, urgh [21:49] * ersi shoots self [21:50] *** db48x` has quit IRC (Read error: Operation timed out) [21:50] ? :/ [21:50] Never do archiving on Windows. Case sensitive file systems and what not.. [21:52] you're right... [21:52] *** db48x` has joined #archiveteam [22:11] *** loopholes has joined #archiveteam [22:12] Hello? [22:15] *** loopholes has quit IRC (Client Quit) [22:17] *** www2 has joined #archiveteam [22:17] *** loopholes has joined #archiveteam [22:31] *** Emcy has quit IRC (Ping timeout: 606 seconds) [22:37] *** Emcy has joined #archiveteam [22:49] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [22:52] *** loopholes has quit IRC (Quit: Page closed) [22:52] *** schbirid has quit IRC (Leaving) [22:54] *** dashcloud has joined #archiveteam [23:03] *** dashcloud has quit IRC (Read error: Connection reset by peer) [23:11] *** dashcloud has joined #archiveteam [23:14] *** db48x` has quit IRC (Ping timeout: 258 seconds) [23:17] *** dashcloud has quit IRC (Read error: Operation timed out) [23:20] *** dashcloud has joined #archiveteam [23:24] *** X-Scale has quit IRC (Ping timeout: 240 seconds) [23:24] *** Coderjoe has quit IRC (Ping timeout: 606 seconds) [23:26] *** Coderjoe has joined #archiveteam [23:34] *** dashcloud has quit IRC (Read error: Operation timed out) [23:38] *** dashcloud has joined #archiveteam [23:44] VADemon: also you should be using warc [23:45] is warc kinda an archive? [23:46] Thanks for the suggestion, I will [23:48] VADemon: yep, it's an archive format designed specifically for stuff like archiving websites [23:48] it preserves all the metadata [23:48] request/response headers and all that [23:48] it's supported by wget by default under the --warc-file flag [23:49] currently reading this http://www.archiveteam.org/index.php?title=Wget_with_WARC_output [23:49] Looks like I have to start the download once again [23:58] *** mistym_ has quit IRC (Remote host closed the connection) [23:59] *** dashcloud has quit IRC (Read error: Operation timed out)