#archiveteam 2014-12-02,Tue

↑back Search

Time Nickname Message
00:11 🔗 Ymgve has quit IRC ()
00:30 🔗 LordNigh2 has joined #archiveteam
00:38 🔗 Lord_Nigh has quit IRC (Ping timeout: 600 seconds)
00:38 🔗 LordNigh2 is now known as Lord_Nigh
01:22 🔗 mutoso has joined #archiveteam
01:39 🔗 cf has joined #archiveteam
01:41 🔗 ete_ has joined #archiveteam
01:48 🔗 primus104 has quit IRC (Leaving.)
01:48 🔗 arkhive has joined #archiveteam
01:49 🔗 the_fox has quit IRC (Ping timeout: 335 seconds)
01:49 🔗 mistym has quit IRC (Remote host closed the connection)
01:50 🔗 the_fox has joined #archiveteam
02:13 🔗 Aranje has quit IRC (Read error: Operation timed out)
02:16 🔗 philpem has quit IRC (Ping timeout: 272 seconds)
02:24 🔗 Aranje has joined #archiveteam
02:37 🔗 REiN^ has quit IRC ()
02:37 🔗 REiN^ has joined #archiveteam
02:56 🔗 signius_ has quit IRC (Ping timeout: 258 seconds)
02:57 🔗 ete_ has quit IRC (Remote host closed the connection)
03:09 🔗 mistym has joined #archiveteam
03:09 🔗 signius_ has joined #archiveteam
03:28 🔗 rejon has joined #archiveteam
04:17 🔗 ex-parro1 has quit IRC (Leaving.)
04:28 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
04:28 🔗 ruukasu has joined #archiveteam
04:29 🔗 ruukasu has quit IRC (Client Quit)
04:29 🔗 ruukasu has joined #archiveteam
04:50 🔗 BlueMaxim has joined #archiveteam
04:55 🔗 aaaaaaaaa has quit IRC (Leaving)
05:03 🔗 todrobbin has joined #archiveteam
05:09 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
05:10 🔗 mistym has quit IRC (Remote host closed the connection)
05:10 🔗 mistym has joined #archiveteam
05:17 🔗 ruukasu has joined #archiveteam
05:33 🔗 Start is now known as StartAway
05:34 🔗 antomati_ has joined #archiveteam
05:36 🔗 antomatic has quit IRC (Ping timeout: 633 seconds)
05:45 🔗 mistym has quit IRC (Remote host closed the connection)
05:48 🔗 todrobbin has quit IRC (todrobbin)
05:50 🔗 todrobbin has joined #archiveteam
05:56 🔗 todrobbin has quit IRC (Quit: todrobbin)
05:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:06 🔗 dashcloud has joined #archiveteam
06:11 🔗 BiggieJo1 has joined #archiveteam
06:15 🔗 BiggieJon has quit IRC (Read error: Operation timed out)
07:24 🔗 ZorbaTHut has quit IRC (Read error: Connection reset by peer)
07:25 🔗 ZorbaTHut has joined #archiveteam
07:37 🔗 midas SketchCow: do you have a collection ready for Viddy?
07:50 🔗 primus104 has joined #archiveteam
07:57 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:04 🔗 dashcloud has joined #archiveteam
08:05 🔗 ex-parrot has quit IRC (Read error: Operation timed out)
08:06 🔗 ex-parrot has joined #archiveteam
08:07 🔗 APerti has quit IRC (Read error: Operation timed out)
08:13 🔗 APerti has joined #archiveteam
08:18 🔗 SketchCow Yes. I need to know the user account on IA to grant admin
08:19 🔗 mistym has joined #archiveteam
08:29 🔗 SketchCow Done. archiveteam_viddy is now your victim.
08:30 🔗 mistym has quit IRC (Read error: Operation timed out)
08:30 🔗 SketchCow It has all the proper logo and writing and so on.
08:31 🔗 primus104 has quit IRC (Leaving.)
08:34 🔗 midas thanks SketchCow !
08:54 🔗 amerrykan has quit IRC (Quit: Quitting)
09:26 🔗 APerti has quit IRC (Ping timeout: 480 seconds)
09:27 🔗 amerrykan has joined #archiveteam
09:29 🔗 primus104 has joined #archiveteam
09:36 🔗 primus104 has quit IRC (Leaving.)
10:41 🔗 antomati_ is now known as antomatic
10:48 🔗 dashcloud has quit IRC (Read error: Operation timed out)
10:55 🔗 dashcloud has joined #archiveteam
11:26 🔗 Ymgve has joined #archiveteam
11:38 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
12:21 🔗 schbirid has joined #archiveteam
12:21 🔗 Emcy_ has quit IRC (Read error: Connection reset by peer)
12:55 🔗 cf has quit IRC (cf)
13:11 🔗 Morbus has quit IRC (Quit: http://www.disobey.com/)
13:14 🔗 Morbus has joined #archiveteam
13:16 🔗 ruukasu has joined #archiveteam
13:33 🔗 useretail has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 rduser has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 Jogie has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 w0rp has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 Sellyme has quit IRC (ircd.shaw.ca irc.shaw.ca)
13:33 🔗 w0rp_ has joined #archiveteam
13:34 🔗 sankin has joined #archiveteam
13:34 🔗 Sellyme has joined #archiveteam
13:34 🔗 SadDM has joined #archiveteam
13:35 🔗 rduser has joined #archiveteam
13:42 🔗 primus104 has joined #archiveteam
13:48 🔗 w0rp_ is now known as w0rp
13:49 🔗 sankin has quit IRC (Leaving.)
13:49 🔗 useretail has joined #archiveteam
14:00 🔗 sankin has joined #archiveteam
14:02 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
14:07 🔗 ruukasu has joined #archiveteam
14:22 🔗 ruukasuu has joined #archiveteam
14:22 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
14:23 🔗 ruukasuu has quit IRC (Client Quit)
14:37 🔗 REiN^ has quit IRC ()
14:38 🔗 REiN^ has joined #archiveteam
14:57 🔗 BiggieJo1 is now known as BiggieJon
15:19 🔗 StartAway is now known as Start
15:24 🔗 BiggieJon has left
15:26 🔗 cf has joined #archiveteam
15:34 🔗 mistym has joined #archiveteam
15:34 🔗 mistym has quit IRC (Remote host closed the connection)
15:39 🔗 BiggieJon has joined #archiveteam
15:43 🔗 primus104 has quit IRC (Leaving.)
15:44 🔗 Start has quit IRC (Remote host closed the connection)
15:55 🔗 mistym has joined #archiveteam
15:56 🔗 aaaaaaaaa has joined #archiveteam
16:00 🔗 schbirid privat.t-online.de has a lot of personal homepages, no idea how to discover them all though
16:00 🔗 midas google site:privat.t-online.de ?
16:02 🔗 schbirid yeah but google does not let one paginate anymore after ~25 or something
16:03 🔗 arkiver Google will give you the number of found links, like 1 million, but will only allow you to view 1000
16:16 🔗 thechip has quit IRC (Read error: Connection reset by peer)
16:18 🔗 Emcy has joined #archiveteam
16:23 🔗 chipper_ has joined #archiveteam
16:24 🔗 chipper_ has left
16:34 🔗 SadDM SketchCow: can you move https://archive.org/details/DarkHorseComicsMessageBoards-FinalGrab into the archive team colloection when you have a moment?
16:37 🔗 SketchCow Done
16:50 🔗 ruukasu has joined #archiveteam
16:55 🔗 SketchCow Tripod.com is going down
16:56 🔗 arkiver tripod.com
16:56 🔗 SketchCow Maybe
16:58 🔗 xmc ! ?
16:59 🔗 Start_ has joined #archiveteam
16:59 🔗 arkiver Sites aren't hard to save, problem is the discovery of the sites that exist. http://196thovi.tripod.com/
16:59 🔗 xmc somewhere between 25-jun-2014 and today they got rid of <http://team-blog.tripod.com/>, but still link it from the front page
17:00 🔗 xmc http://web.archive.org/web/20140625035208/http://team-blog.tripod.com/
17:00 🔗 arkiver we have various sources (wayback, google, etc.)
17:00 🔗 arkiver but those will most likely not get everything. The wayback just doesn't have all websites and google only shows the first 1000 results
17:01 🔗 DFJustin there's also searching wayback for the old url, http://members.tripod.com/*
17:01 🔗 chfoo http://urlsearch.commoncrawl.org/?q=tripod.com
17:01 🔗 arkiver yeah, I mentioned that
17:02 🔗 arkiver I mean the wayback
17:02 🔗 arkiver not the commoncrawl yet
17:02 🔗 arkiver SketchCow: if you are not able to get a full list of websites some way (they might have some hidden index on their site?), would you like to contact them about this?
17:02 🔗 chfoo we can do a discovery scraping google/bing with a dictionary if that's needed
17:03 🔗 arkiver a dictionary on google?
17:03 🔗 chfoo a word list i mean
17:04 🔗 arkiver Like: site:*.tripod.com *aaa*
17:04 🔗 arkiver site:*.tripod.com *the* etc.?
17:05 🔗 DFJustin hmm it's not accepting my tripod password
17:06 🔗 Start_ should we start a project for http://ep1c.com?
17:06 🔗 Start_ it's also owned by viddy and shutting down on the same date (dec. 15)
17:07 🔗 arkiver Start_: yep, I saw your posts about it (sorry for not responding)
17:07 🔗 DFJustin reset works though
17:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:10 🔗 mistym has quit IRC (Remote host closed the connection)
17:15 🔗 dashcloud has joined #archiveteam
17:22 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
17:23 🔗 schbirid not tripod :((
17:24 🔗 schbirid http://members.tripod.com/robots.txt has sitemaps
17:24 🔗 Start_ is now known as Start
17:25 🔗 schbirid whoever does this, please grab angelfire in the same go. same sitemap structure
17:25 🔗 schbirid also please educate me how you do it, because i got stuck with angelfire and got no help
17:26 🔗 arkiver thanks for those sitemaps
17:26 🔗 arkiver I'll create a discovery project which will find all the sites using those sitemaps
17:26 🔗 schbirid all URLs are inside the sitemaps
17:27 🔗 arkiver yes
17:28 🔗 schbirid just not the media embedded in those sites, that was my problem
17:29 🔗 arkiver schbirid: do you have an example for me?
17:29 🔗 arkiver and were you using wget lua?
17:29 🔗 schbirid http://members.tripod.com/a1modularhomes/sitemap.xml random
17:29 🔗 schbirid nope
17:29 🔗 schbirid i gave up because i would have made a mess
17:29 🔗 lbft has quit IRC (Read error: Operation timed out)
17:30 🔗 arkiver do you mean by "the media embedded in those sites" external pictures and videos?
17:30 🔗 schbirid the sitemaps only have html pages
17:30 🔗 schbirid so any images etc need to be found
17:30 🔗 arkiver I see what you mean now, sorry
17:30 🔗 schbirid :)
17:30 🔗 arkiver Yeah, I'll get those done by wget lua
17:31 🔗 aaaaaaaaa maybe there should be a tripod channel
17:31 🔗 arkiver Maybe it's not going to be a discovery project btw
17:31 🔗 arkiver but we'll see
17:32 🔗 garyrh #wobbly ?
17:33 🔗 arkiver SketchCow: do we have the shutdown date?
17:33 🔗 lbft has joined #archiveteam
17:34 🔗 SketchCow No, and there's a chance this tip may have just come from someone finding what you did - the site seems really on the rack, blog no longer works, etc.
17:34 🔗 mistym has joined #archiveteam
17:38 🔗 aaaaaaaaa #byepod is what I was thinking
17:43 🔗 philpem has joined #archiveteam
17:50 🔗 Start has quit IRC (Ping timeout: 272 seconds)
18:05 🔗 Jogie has joined #archiveteam
18:09 🔗 APerti has joined #archiveteam
18:10 🔗 rejon has quit IRC (Ping timeout: 480 seconds)
18:35 🔗 cf_ has joined #archiveteam
18:36 🔗 cf has quit IRC (Ping timeout: 246 seconds)
18:36 🔗 cf_ is now known as cf
18:39 🔗 primus104 has joined #archiveteam
18:43 🔗 thechip has joined #archiveteam
18:56 🔗 thechip has quit IRC (Read error: Operation timed out)
19:03 🔗 arkiver SketchCow: ok if I wait till there is more information on the shutdown before I get the scripts ready?
19:04 🔗 Sk2d has joined #archiveteam
19:04 🔗 Sk1d has quit IRC (Read error: Operation timed out)
19:04 🔗 Sk2d is now known as Sk1d
19:11 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
19:11 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:14 🔗 Sk1d has joined #archiveteam
19:21 🔗 dashcloud has joined #archiveteam
19:23 🔗 primus104 has quit IRC (Leaving.)
19:30 🔗 dashcloud has quit IRC (Remote host closed the connection)
19:31 🔗 dashcloud has joined #archiveteam
19:32 🔗 arkiver midas: http://dat.serveert.me.uk/p/ftp
19:32 🔗 arkiver is currently down :/
19:34 🔗 SketchCow Yes, please do.
19:35 🔗 arkiver ok
19:42 🔗 Start has joined #archiveteam
19:43 🔗 ruukasu has joined #archiveteam
19:46 🔗 bauruine has quit IRC (Ping timeout: 265 seconds)
19:48 🔗 philpem has quit IRC (Ping timeout: 272 seconds)
19:51 🔗 bauruine has joined #archiveteam
20:02 🔗 Start https://roon.io
20:02 🔗 Start http://blog.ghost.org/roon/
20:02 🔗 Start "The Roon.io hosted platform will be closing its doors on December 31st, 2014."
20:03 🔗 Kniffy has quit IRC (Quit: pup)
20:03 🔗 thechip has joined #archiveteam
20:05 🔗 Kniffy has joined #archiveteam
20:08 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
20:16 🔗 Start here's a google crawl for roon: http://paste.archivingyoursh.it/goxowihalo.avrasm
20:19 🔗 SN4T14 has quit IRC (Ping timeout: 369 seconds)
20:19 🔗 Start looks like roon can be sequentially scraped through its api: https://roon.io/developer/blogs
20:28 🔗 Start has quit IRC (Read error: Connection reset by peer)
20:32 🔗 Start has joined #archiveteam
20:34 🔗 primus104 has joined #archiveteam
20:37 🔗 arkiver SketchCow: a fast and small project is starting very soon: ziplist
20:37 🔗 arkiver #zipyourlips
20:37 🔗 ex-parro1 has joined #archiveteam
20:37 🔗 arkiver That one is going to FOS, currently 30.000 warc's
20:42 🔗 dashcloud has quit IRC (Read error: Operation timed out)
20:43 🔗 Start cf: since you've been doing API scrapes for a couple recent projects, mind doing one for roon?
20:44 🔗 Start http://archiveteam.org/index.php?title=Roon
20:44 🔗 cf Start: I’ll have a go at it. Not sure when I’ll get around to it, but within a week or so
20:45 🔗 arkiver Start: are those api's just incremental numbered?
20:45 🔗 Start yes
20:45 🔗 dashcloud has joined #archiveteam
20:45 🔗 arkiver then I'll do them in the scripts, we also save the api urls that way
20:45 🔗 Start ok
20:45 🔗 cf Yea, just about to say
20:46 🔗 Start we need an irc channel name for roon
20:46 🔗 Start #rooin
20:48 🔗 Start or maybe #rooined
20:49 🔗 Start i like rooined better
21:01 🔗 T31M has quit IRC (Quit: Leaving)
21:09 🔗 aaaaaaaaa has quit IRC (Leaving)
21:09 🔗 aaaaaaaaa has joined #archiveteam
21:25 🔗 cf has quit IRC (Ping timeout: 265 seconds)
21:26 🔗 Start_ has joined #archiveteam
21:27 🔗 Start has quit IRC (Read error: Connection reset by peer)
21:28 🔗 midas arkiver: ill fix it in a minute
21:36 🔗 midas fixed
21:36 🔗 midas forgot it rebooted this box
21:36 🔗 bauruine has quit IRC (Ping timeout: 265 seconds)
21:36 🔗 K4k has joined #archiveteam
21:38 🔗 xk_id has quit IRC (Read error: Operation timed out)
21:42 🔗 bauruine has joined #archiveteam
21:54 🔗 arkiver thanks midas
21:56 🔗 SN4T14 has joined #archiveteam
21:57 🔗 sankin has quit IRC (Leaving.)
22:00 🔗 hive-mind has quit IRC (Ping timeout: 272 seconds)
22:07 🔗 ruukasu has joined #archiveteam
22:09 🔗 cbb has joined #archiveteam
22:11 🔗 thechip has quit IRC (Quit: Leaving...)
22:12 🔗 hive-mind has joined #archiveteam
22:14 🔗 Start_ is now known as Start
22:17 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:20 🔗 dashcloud has joined #archiveteam
22:22 🔗 Start has quit IRC (Read error: Connection reset by peer)
22:24 🔗 arkiver SketchCow: ziplist should be incoming on FOS now, 30.000 warc's
22:24 🔗 Start has joined #archiveteam
22:25 🔗 K4k has quit IRC (Ping timeout: 378 seconds)
22:25 🔗 schbirid has quit IRC (Leaving)
22:27 🔗 cf has joined #archiveteam
22:40 🔗 REiN^ has quit IRC (Read error: Connection reset by peer)
22:41 🔗 REiN^ has joined #archiveteam
22:46 🔗 Start arkiver: the highest valid roon blog i could find was: https://roon.io/api/v1/blogs/122233
22:47 🔗 REiN^ has quit IRC (Read error: Connection reset by peer)
22:47 🔗 arkiver Start: thanks, but first ep1c
22:47 🔗 arkiver Tomorrow is ep1c day
22:47 🔗 Start ok
22:48 🔗 Start i'm guessing that ep1c's grab scripts will be very similar to viddy's?
22:48 🔗 REiN^ has joined #archiveteam
22:49 🔗 signius_ has quit IRC (Read error: Operation timed out)
22:50 🔗 arkiver probably, but I'll see that tomorrow
23:02 🔗 signius_ has joined #archiveteam
23:07 🔗 ex-parro1 has quit IRC (Remote host closed the connection)
23:07 🔗 dashcloud so tripod really is going down?
23:08 🔗 Start has quit IRC (Ping timeout: 378 seconds)
23:08 🔗 garyrh maaaybe
23:08 🔗 dashcloud might as well grab angelfire while we're at it- you'd then have the three big players from the 90s
23:10 🔗 xmc does lycos still host homepages?
23:11 🔗 REiN^ has quit IRC (Read error: Operation timed out)
23:15 🔗 dashcloud yeah- there's classic 90s Angelfire:http://www.angelfire.com/sd/ScrewAOL/ and the modern Angelfire: http://www.angelfire.lycos.com/
23:17 🔗 ex-parro1 has joined #archiveteam
23:26 🔗 dashcloud so modern angelfire is probably not too hard to archive because there's sitemaps listing the pages: http://www.angelfire.com/sitemap-index-00.xml.gz
23:26 🔗 dashcloud I started that project but when wget got killed because of memkiller, I stopped
23:50 🔗 cf has quit IRC (Quit: cf)
23:57 🔗 godane so some of the KBS News Today i got are incomplete
23:58 🔗 godane doing a 2nd rtmpdump gets me a bigger file
23:58 🔗 godane i must have been doing too many at once

irclogger-viewer