#archiveteam 2015-03-13,Fri

↑back Search

Time Nickname Message
00:06 🔗 londoncal has quit IRC (Remote host closed the connection)
00:11 🔗 joepie91 caber: the archivebot crawl will be part of a larger pack of uploads, so it won't have individual meta information
00:11 🔗 caber ok
00:11 🔗 joepie91 caber: see eg. https://archive.org/details/archiveteam_archivebot_go_20150312050003
00:12 🔗 caber ok
00:12 🔗 Ymgve has quit IRC ()
00:13 🔗 caber who archived archive.org? :_)
00:14 🔗 caber (with other words, what happens if their building burns down, and they get some nasty skynet-like virus at the same time? do they have off-site tape?)
00:14 🔗 balrog caber: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK
00:14 🔗 caber thanks balrog
00:15 🔗 balrog caber: they have two locations and there's four copies of everything iirc
00:15 🔗 balrog but it's still not foolproof
00:15 🔗 joepie91 balrog: of course that is a page on the wiki
00:15 🔗 joepie91 hah
00:20 🔗 caber yeah, it are mainly proposal
00:20 🔗 caber but - if the internet archive burns, we're doomed?
00:20 🔗 caber so to speak
00:20 🔗 www2 has quit IRC (Ping timeout: 306 seconds)
00:21 🔗 joepie91 well, the internet archive /has/ burned, sort of
00:21 🔗 joepie91 http://www.theverge.com/2013/11/7/5076166/the-internet-archive-seeks-donations-after-fire-destroys-equipment
00:21 🔗 rose has joined #archiveteam
00:25 🔗 joepie91 balrog: added a note here: http://archiveteam.org/index.php?title=Talk:INTERNETARCHIVE.BAK#Other_anticipated_problems
00:26 🔗 joepie91 re: bad actor prevention
00:26 🔗 joepie91 I'd say it at least raises the bar for bad actors to the point of "unlikely anybody will bother", assuming the implementation works as designed
00:31 🔗 mistym has quit IRC (Remote host closed the connection)
00:32 🔗 cbb2 has joined #archiveteam
00:34 🔗 TheFifthH has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 36.0.1/20150305021524])
00:35 🔗 cbb has quit IRC (Read error: Operation timed out)
00:37 🔗 Start has quit IRC (Disconnected.)
00:47 🔗 Start has joined #archiveteam
01:05 🔗 mistym has joined #archiveteam
01:10 🔗 rose has quit IRC (Leaving)
01:27 🔗 Start has quit IRC (Disconnected.)
01:42 🔗 Start has joined #archiveteam
01:49 🔗 arkhive has joined #archiveteam
01:50 🔗 arkhive I have an idea on an archiving project. Well, something like a project.
01:50 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
01:50 🔗 BlueMaxim has joined #archiveteam
01:50 🔗 arkhive oops i meant to do that in main AT IRC
01:53 🔗 winr4r you did
01:53 🔗 arkhive oops. oh fck
01:53 🔗 arkhive lol i just read the end of the title on window of mIRC and it said -bs
01:54 🔗 arkhive I'll copy and paste, if alriihhgt
01:54 🔗 arkhive I haven't been on AT/active for a few months. been busy, yo
01:54 🔗 arkhive <arkhive> Amazon has their Prime Instant Video service and they have a thing called 'Pilot Season' Right now it is like the fourth 'Pilot Season' In each season pilot episodes for original content exclusive to Amazon Prime, series are pitched. Some picked up, some are not. Now... most of the 'unsold pilots' from Amazon's Pilot Season are unavailable on their website.
01:54 🔗 arkhive <arkhive> I think we should get all of the ones that were not sold/not picked up as a series. and keep them before they disappear. I haven't done too much looking into various bit torrent sites to see if I can find all of them. but i know Amazon does not host a lot of them anymore
01:55 🔗 arkhive Anyway, I think it would be good to gather the unsold/not picked up pilots together to put somewhere. I was unable to find a lot of them. Some are actually pretty cool(the ones that were not picked up, that is)
01:56 🔗 arkhive here is the link to the wikipedia page(hopefully accurate enough
01:56 🔗 arkhive http://en.wikipedia.org/wiki/List_of_original_programs_distributed_by_Amazon#Pilots
01:57 🔗 arkhive What do ya'll think? Apologies for lots of posts :P Let me know :)
01:57 🔗 winr4r where would you find them?
01:58 🔗 arkhive Not sure. I was thinking ask arround on private torrent sites, like make requests(I have never made a request so idk if that'd work) or loook at various public/private torrent sites, search, read IMDB message boards/post on there asking for help.
01:58 🔗 arkhive ask around. maybe some AT memembers have some?
01:59 🔗 * winr4r doesn't, rarely downloads much
01:59 🔗 arkhive ah
01:59 🔗 winr4r it's an interesting idea though, probably have to be a dark collection though
01:59 🔗 arkhive same. I have the fastest connection offered in my area and it is 12Mbit/s down 896Kbit/s up lol
01:59 🔗 arkhive what does that mean?
02:00 🔗 winr4r meaning one that is in a safe place, but not publically accessible
02:00 🔗 arkhive after i snatch a copy and watch 'em all :) lol
02:00 🔗 winr4r i recently went from 8/.384 to 38/15
02:00 🔗 winr4r feels good man
02:01 🔗 arkhive ya. i want gigabit lol. but that'll be a long while till my area gets it
02:01 🔗 arkhive Can we post the idea in the project idea section of AT wiki?
02:08 🔗 winr4r where?
02:09 🔗 winr4r either there's not a "throwing out ideas" page and there should be, or there is one and i can't find it
02:09 🔗 arkhive i thought there was one. but i am totally out of the loop so who knows
02:10 🔗 winr4r i am a little bit out of the loop too
02:10 🔗 primus104 has quit IRC (Leaving.)
02:12 🔗 winr4r i'm sure i've seen a page just like that, i just don't know where
02:13 🔗 garyrh http://www.archiveteam.org/index.php?title=Fire_Drill
02:13 🔗 arkhive thank you, gayrh
02:14 🔗 arkhive garyrh
02:24 🔗 cbb2 has quit IRC (Quit: cbb2)
02:31 🔗 cloudyys has joined #archiveteam
02:57 🔗 cloudcake has joined #archiveteam
02:58 🔗 Start LayerVault, often described as "version control for designers", is shutting down on April 11, 2015.
02:59 🔗 Start Unless we get enough items from Bing, Twitter, and URLTeam, looks like we'll have to do discovery.
02:59 🔗 cloudyys has quit IRC (Read error: Operation timed out)
03:00 🔗 Start Items are 10 characters, mixed-case alphanumeric. (Example: https://layervault.com/rev/lbG9ItfGiW)
03:00 🔗 winr4r sequential?
03:00 🔗 Start Doesn't look like it.
03:01 🔗 winr4r then forget about brute forcing them
03:02 🔗 Start We'd be best off randomly trying different URLs.
03:02 🔗 Start There's also news.layervault.com
03:03 🔗 Start Which should be easier, everything is numeric and sequential: https://news.layervault.com/stories/10831
03:05 🔗 Start I suggest #layersalt
03:05 🔗 winr4r news.layervault can be shoved into archivebot
03:06 🔗 winr4r probably
03:06 🔗 Start I've already put it in
03:07 🔗 winr4r \o/
03:08 🔗 Start It has 45,940 stories and 21,932 users
03:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:12 🔗 dashcloud has joined #archiveteam
03:20 🔗 mistym has quit IRC (Remote host closed the connection)
03:22 🔗 SketchCow hi
03:22 🔗 SketchCow can we start #froogle
03:26 🔗 winr4r wasn't that once the name of a google service
03:27 🔗 WubTheCap Did nobody put pjsmprints on archivebot again? It's feeling much healthier now in terms of latency, the last backup effort was partial because things got redirected to localhost.
03:32 🔗 SketchCow and start finding every google service that shares piblic cultural items.
03:32 🔗 SketchCow sorry irc on phone on plane.
03:41 🔗 godane has quit IRC (Leaving.)
03:41 🔗 garyrh so do you mean something like University of Wherever hosts everything on Google Blah, let's back it up?
03:42 🔗 mistym has joined #archiveteam
03:52 🔗 SketchCow No.
03:53 🔗 SketchCow Imean like blogger, google code, etc.
03:53 🔗 SketchCow Google images,google maps
03:53 🔗 SketchCow Just anything that takes user data. Just to get a sense.
03:53 🔗 garyrh Oh, I see.
03:56 🔗 garyrh Sort of a Google census.
04:21 🔗 cloudcake has quit IRC (Leaving)
04:30 🔗 lazlonibb has joined #archiveteam
04:31 🔗 lazlonibb has left
04:33 🔗 DFJustin hmm is there any possibility of publicly shaming aol for killing their 20-year-old file libraries, the discussion in #aohell is kind of depressing (we can still download file descriptions, but not files)
04:46 🔗 johtso has quit IRC (Quit: Connection closed for inactivity)
05:06 🔗 wp494_ has joined #archiveteam
05:17 🔗 wp494 has quit IRC (Ping timeout: 740 seconds)
05:38 🔗 wp494_ has quit IRC (Remote host closed the connection)
05:38 🔗 wp494 has joined #archiveteam
06:44 🔗 rejon has quit IRC (Ping timeout: 512 seconds)
06:44 🔗 garyrh SketchCow, here's a start: http://www.archiveteam.org/index.php?title=Froogle
06:51 🔗 mst_ has joined #archiveteam
06:56 🔗 rejon has joined #archiveteam
06:58 🔗 db48x has joined #archiveteam
07:00 🔗 X-Scale has joined #archiveteam
07:06 🔗 mistym has quit IRC (Remote host closed the connection)
07:13 🔗 mst_ has quit IRC (Quit: bye)
07:16 🔗 db48x has quit IRC (Read error: Operation timed out)
07:23 🔗 signius has quit IRC (Read error: Operation timed out)
07:31 🔗 londoncal has joined #archiveteam
07:35 🔗 techapj has joined #archiveteam
07:35 🔗 dashcloud has quit IRC (Read error: Operation timed out)
07:35 🔗 signius has joined #archiveteam
07:38 🔗 dashcloud has joined #archiveteam
07:54 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:01 🔗 dashcloud has joined #archiveteam
08:11 🔗 ohhdemgir for n in $(seq 1 1600); do wget --content-disposition -c http://felixonline.co.uk/issuearchive/issue/$n/download/; done
08:16 🔗 londoncal has quit IRC (Quit: Leaving...)
08:25 🔗 schbirid has joined #archiveteam
08:29 🔗 primus104 has joined #archiveteam
08:55 🔗 ohhdemgir http://google-opensource.blogspot.co.uk/2015/03/farewell-to-google-code.html?showComment=1426180063486#c3759045954192192386
08:55 🔗 ohhdemgir "It's like watching Geocities go away."
09:07 🔗 rolfb has joined #archiveteam
09:24 🔗 techapj has quit IRC ()
09:26 🔗 RuairiCOL has quit IRC ()
09:28 🔗 primus104 has quit IRC (Leaving.)
09:34 🔗 d5af1e30 has joined #archiveteam
09:42 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
09:47 🔗 SadDM has joined #archiveteam
09:47 🔗 swebb sets mode: +o SadDM
09:55 🔗 SadDM_ has joined #archiveteam
09:55 🔗 swebb sets mode: +o SadDM_
09:55 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
10:05 🔗 Ymgve has joined #archiveteam
10:10 🔗 SadDM_ has quit IRC (Ping timeout: 370 seconds)
10:16 🔗 SadDM has joined #archiveteam
10:16 🔗 swebb sets mode: +o SadDM
10:23 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
10:24 🔗 rolfb has quit IRC (Leaving...)
10:25 🔗 johtso has joined #archiveteam
10:29 🔗 SadDM has joined #archiveteam
10:29 🔗 swebb sets mode: +o SadDM
10:36 🔗 SadDM has quit IRC (Read error: Connection reset by peer)
10:36 🔗 SadDM has joined #archiveteam
10:36 🔗 swebb sets mode: +o SadDM
10:43 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
10:52 🔗 ohhdemgir has quit IRC (Quit: Leaving)
10:54 🔗 SadDM has joined #archiveteam
10:54 🔗 swebb sets mode: +o SadDM
11:06 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
11:07 🔗 rejon has quit IRC (Ping timeout: 512 seconds)
11:11 🔗 SadDM has joined #archiveteam
11:11 🔗 swebb sets mode: +o SadDM
11:15 🔗 rejon has joined #archiveteam
11:20 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
11:23 🔗 ohhdemgir has joined #archiveteam
11:25 🔗 Nertsy has quit IRC (Quit: Nertsy)
11:25 🔗 Nertsy has joined #archiveteam
11:37 🔗 SadDM has joined #archiveteam
11:37 🔗 swebb sets mode: +o SadDM
11:47 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:53 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
12:03 🔗 SadDM has joined #archiveteam
12:03 🔗 swebb sets mode: +o SadDM
12:44 🔗 sankin has joined #archiveteam
13:52 🔗 rejon has quit IRC (Ping timeout: 512 seconds)
14:01 🔗 Froggypwn has quit IRC (Read error: Connection reset by peer)
14:01 🔗 Froggypwn has joined #archiveteam
14:04 🔗 rejon has joined #archiveteam
14:15 🔗 d5af1e30 has quit IRC (Read error: Connection reset by peer)
14:16 🔗 caber I was reading trough the 'archive the Internet Archive' page on the wiki
14:18 🔗 Start has quit IRC (Disconnected.)
14:18 🔗 caber I like the though of checking cryptographically or someone really has file X or Y - without them being able to pre-calculate all possible challenges
14:18 🔗 caber while having an open design
14:26 🔗 VADemon has joined #archiveteam
14:39 🔗 primus104 has joined #archiveteam
14:44 🔗 ersi You might want to join #internetarchive.bak
14:44 🔗 ersi Which is where discussions on the 'archive IA' thing takes place.
14:44 🔗 ersi caber: ^
14:48 🔗 dshr has joined #archiveteam
14:56 🔗 mistym has joined #archiveteam
14:57 🔗 mistym has quit IRC (Remote host closed the connection)
15:03 🔗 Start has joined #archiveteam
15:05 🔗 Start https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
15:05 🔗 Start oh look, google's killing another product
15:05 🔗 Start freebase shutting down mid-2015
15:06 🔗 Start irc channel ideas?
15:06 🔗 xmc uh, they're going to publish a dump
15:06 🔗 xmc so the best channel name would maybe be #archiveteam
15:10 🔗 db48x has joined #archiveteam
15:14 🔗 Start oh, missed that
15:19 🔗 mistym has joined #archiveteam
15:25 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
15:27 🔗 dashcloud has joined #archiveteam
15:43 🔗 VADemon Yahoo #2 is rising
15:46 🔗 mistym has quit IRC (Remote host closed the connection)
15:51 🔗 Start has quit IRC (Disconnected.)
15:56 🔗 Froggypwn has quit IRC (Read error: Connection reset by peer)
15:57 🔗 Start has joined #archiveteam
15:58 🔗 Froggypwn has joined #archiveteam
15:58 🔗 Start has quit IRC (Read error: Connection reset by peer)
15:58 🔗 Start has joined #archiveteam
16:02 🔗 mistym has joined #archiveteam
16:03 🔗 SadDM has quit IRC (Ping timeout: 370 seconds)
16:03 🔗 SadDM has joined #archiveteam
16:03 🔗 swebb sets mode: +o SadDM
16:15 🔗 godane has joined #archiveteam
16:31 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
16:32 🔗 Start arkiver: when will trovebox and rapidshare discovery start?
16:32 🔗 dashcloud has joined #archiveteam
16:35 🔗 Start http://superwaitlist.com/google-deprecates-old-webmaster-tools-api/
16:35 🔗 Start google's on a kill spree this week
16:41 🔗 Thynix has quit IRC (Ping timeout: 186 seconds)
16:45 🔗 Start has quit IRC (Disconnected.)
16:49 🔗 primus104 has quit IRC (Leaving.)
16:53 🔗 Start has joined #archiveteam
17:13 🔗 chazchaz_ has joined #archiveteam
17:22 🔗 Start i'm wondering if we should do something like #froogle for yahoo
17:23 🔗 Start maybe #woohoo ?
17:33 🔗 schbirid yanoo
17:34 🔗 Start_ has joined #archiveteam
17:34 🔗 Start has quit IRC (Read error: Connection reset by peer)
17:35 🔗 Start_ is now known as Start
17:38 🔗 Start #ohnoo also would work
17:39 🔗 Start i'll just use woohoo for now
17:39 🔗 Start http://archiveteam.org/index.php?title=Woohoo
17:43 🔗 Start has quit IRC (Disconnected.)
17:59 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
18:00 🔗 habi has joined #archiveteam
18:01 🔗 dashcloud has joined #archiveteam
18:30 🔗 ionpulse has quit IRC (Ping timeout: 506 seconds)
18:34 🔗 habi has left
18:42 🔗 Start has joined #archiveteam
18:44 🔗 Start_ has joined #archiveteam
18:44 🔗 Start has quit IRC (Read error: Connection reset by peer)
18:46 🔗 schbirid could anyone with a good chunk of space archive this? http://demos.igmdb.org/
18:46 🔗 schbirid best check the sizes first, iirc challenge-tv should be ~70G alone
18:46 🔗 schbirid please tell me if you take this project
18:55 🔗 Emcy has quit IRC (Ping timeout: 362 seconds)
18:56 🔗 Emcy has joined #archiveteam
19:02 🔗 Start_ has quit IRC (Disconnected.)
19:17 🔗 Start has joined #archiveteam
19:27 🔗 Start has quit IRC (Disconnected.)
19:32 🔗 SN4T14 has joined #archiveteam
19:32 🔗 Start has joined #archiveteam
19:45 🔗 VADemon schbirid, I would like to try, I have roughly 800 + 1000GB space
19:46 🔗 schbirid more than enough :)
19:46 🔗 schbirid i think
19:46 🔗 SN4T14_ has joined #archiveteam
19:46 🔗 VADemon this will be my first project then :)
19:46 🔗 schbirid excellent!
19:48 🔗 schbirid "wget -m -a demos.igmdb.org_$(date +%Y%m%d).log" should be all that's needed
19:49 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
19:49 🔗 VADemon what about -r for recursive? I've used wget only a few times
19:50 🔗 SN4T14 has quit IRC (Ping timeout: 306 seconds)
19:51 🔗 xmc VADemon: iirc, -m implies -r
19:51 🔗 xmc -m === -r -N -l inf --no-remove-listing
19:51 🔗 schbirid yeah
19:51 🔗 dashcloud has joined #archiveteam
19:52 🔗 VADemon give me some time guys :)
19:52 🔗 schbirid theoretically for a site like this one would not want the html indexes but hey, they don't hurt either and keeping them is easier
19:55 🔗 primus104 has joined #archiveteam
20:05 🔗 mistym_ has joined #archiveteam
20:10 🔗 mistym has quit IRC (Read error: Operation timed out)
20:14 🔗 VADemon wget.exe --directory-prefix="Y:\archive_demos.igmdb.org\" --append-output "demos.igmdb.org_$(date +%Y%m%d).log" -m http://demos.igmdb.org/
20:14 🔗 BlueMaxim has joined #archiveteam
20:14 🔗 VADemon gives me Scheme missing error
20:16 🔗 schbirid sorry, that $(date) thing is linux only
20:17 🔗 schbirid would result in "20150313"
20:17 🔗 schbirid i just like timestamps ;9
20:17 🔗 Start has quit IRC (Disconnected.)
20:18 🔗 VADemon I thought I would still work so didnt even ask, just worked for me under ubuntu (vm)
20:18 🔗 godane has quit IRC (Quit: Leaving.)
20:21 🔗 VADemon ok it's working, only downloading index.htmls atm
20:25 🔗 VADemon it works, thanks for helping with the setup
20:35 🔗 BlueMaxim has quit IRC (Ping timeout: 512 seconds)
20:36 🔗 BlueMaxim has joined #archiveteam
20:43 🔗 schbirid yay
20:53 🔗 sankin has quit IRC (Leaving.)
20:54 🔗 lag2 has joined #archiveteam
21:29 🔗 godane has joined #archiveteam
21:30 🔗 db48x` has joined #archiveteam
21:45 🔗 dshr has quit IRC (Quit: Page closed)
21:48 🔗 philpem has joined #archiveteam
21:49 🔗 ersi VADemon: Dude, don't run it on windows
21:49 🔗 ersi Jesus god, urgh
21:49 🔗 * ersi shoots self
21:50 🔗 db48x` has quit IRC (Read error: Operation timed out)
21:50 🔗 VADemon ? :/
21:50 🔗 ersi Never do archiving on Windows. Case sensitive file systems and what not..
21:52 🔗 VADemon you're right...
21:52 🔗 db48x` has joined #archiveteam
22:11 🔗 loopholes has joined #archiveteam
22:12 🔗 loopholes Hello?
22:15 🔗 loopholes has quit IRC (Client Quit)
22:17 🔗 www2 has joined #archiveteam
22:17 🔗 loopholes has joined #archiveteam
22:31 🔗 Emcy has quit IRC (Ping timeout: 606 seconds)
22:37 🔗 Emcy has joined #archiveteam
22:49 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
22:52 🔗 loopholes has quit IRC (Quit: Page closed)
22:52 🔗 schbirid has quit IRC (Leaving)
22:54 🔗 dashcloud has joined #archiveteam
23:03 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
23:11 🔗 dashcloud has joined #archiveteam
23:14 🔗 db48x` has quit IRC (Ping timeout: 258 seconds)
23:17 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:20 🔗 dashcloud has joined #archiveteam
23:24 🔗 X-Scale has quit IRC (Ping timeout: 240 seconds)
23:24 🔗 Coderjoe has quit IRC (Ping timeout: 606 seconds)
23:26 🔗 Coderjoe has joined #archiveteam
23:34 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:38 🔗 dashcloud has joined #archiveteam
23:44 🔗 joepie91 VADemon: also you should be using warc
23:45 🔗 VADemon is warc kinda an archive?
23:46 🔗 VADemon Thanks for the suggestion, I will
23:48 🔗 joepie91 VADemon: yep, it's an archive format designed specifically for stuff like archiving websites
23:48 🔗 joepie91 it preserves all the metadata
23:48 🔗 joepie91 request/response headers and all that
23:48 🔗 joepie91 it's supported by wget by default under the --warc-file flag
23:49 🔗 VADemon currently reading this http://www.archiveteam.org/index.php?title=Wget_with_WARC_output
23:49 🔗 VADemon Looks like I have to start the download once again
23:58 🔗 mistym_ has quit IRC (Remote host closed the connection)
23:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)

irclogger-viewer