Time |
Nickname |
Message |
00:11
🔗
|
|
Ymgve has quit IRC () |
00:30
🔗
|
|
LordNigh2 has joined #archiveteam |
00:38
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 600 seconds) |
00:38
🔗
|
|
LordNigh2 is now known as Lord_Nigh |
01:22
🔗
|
|
mutoso has joined #archiveteam |
01:39
🔗
|
|
cf has joined #archiveteam |
01:41
🔗
|
|
ete_ has joined #archiveteam |
01:48
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:48
🔗
|
|
arkhive has joined #archiveteam |
01:49
🔗
|
|
the_fox has quit IRC (Ping timeout: 335 seconds) |
01:49
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
01:50
🔗
|
|
the_fox has joined #archiveteam |
02:13
🔗
|
|
Aranje has quit IRC (Read error: Operation timed out) |
02:16
🔗
|
|
philpem has quit IRC (Ping timeout: 272 seconds) |
02:24
🔗
|
|
Aranje has joined #archiveteam |
02:37
🔗
|
|
REiN^ has quit IRC () |
02:37
🔗
|
|
REiN^ has joined #archiveteam |
02:56
🔗
|
|
signius_ has quit IRC (Ping timeout: 258 seconds) |
02:57
🔗
|
|
ete_ has quit IRC (Remote host closed the connection) |
03:09
🔗
|
|
mistym has joined #archiveteam |
03:09
🔗
|
|
signius_ has joined #archiveteam |
03:28
🔗
|
|
rejon has joined #archiveteam |
04:17
🔗
|
|
ex-parro1 has quit IRC (Leaving.) |
04:28
🔗
|
|
ruukasu has quit IRC (Quit: WeeChat 1.0.1) |
04:28
🔗
|
|
ruukasu has joined #archiveteam |
04:29
🔗
|
|
ruukasu has quit IRC (Client Quit) |
04:29
🔗
|
|
ruukasu has joined #archiveteam |
04:50
🔗
|
|
BlueMaxim has joined #archiveteam |
04:55
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:03
🔗
|
|
todrobbin has joined #archiveteam |
05:09
🔗
|
|
ruukasu has quit IRC (Quit: WeeChat 1.0.1) |
05:10
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
05:10
🔗
|
|
mistym has joined #archiveteam |
05:17
🔗
|
|
ruukasu has joined #archiveteam |
05:33
🔗
|
|
Start is now known as StartAway |
05:34
🔗
|
|
antomati_ has joined #archiveteam |
05:36
🔗
|
|
antomatic has quit IRC (Ping timeout: 633 seconds) |
05:45
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
05:48
🔗
|
|
todrobbin has quit IRC (todrobbin) |
05:50
🔗
|
|
todrobbin has joined #archiveteam |
05:56
🔗
|
|
todrobbin has quit IRC (Quit: todrobbin) |
05:59
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:06
🔗
|
|
dashcloud has joined #archiveteam |
06:11
🔗
|
|
BiggieJo1 has joined #archiveteam |
06:15
🔗
|
|
BiggieJon has quit IRC (Read error: Operation timed out) |
07:24
🔗
|
|
ZorbaTHut has quit IRC (Read error: Connection reset by peer) |
07:25
🔗
|
|
ZorbaTHut has joined #archiveteam |
07:37
🔗
|
midas |
SketchCow: do you have a collection ready for Viddy? |
07:50
🔗
|
|
primus104 has joined #archiveteam |
07:57
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
08:04
🔗
|
|
dashcloud has joined #archiveteam |
08:05
🔗
|
|
ex-parrot has quit IRC (Read error: Operation timed out) |
08:06
🔗
|
|
ex-parrot has joined #archiveteam |
08:07
🔗
|
|
APerti has quit IRC (Read error: Operation timed out) |
08:13
🔗
|
|
APerti has joined #archiveteam |
08:18
🔗
|
SketchCow |
Yes. I need to know the user account on IA to grant admin |
08:19
🔗
|
|
mistym has joined #archiveteam |
08:29
🔗
|
SketchCow |
Done. archiveteam_viddy is now your victim. |
08:30
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
08:30
🔗
|
SketchCow |
It has all the proper logo and writing and so on. |
08:31
🔗
|
|
primus104 has quit IRC (Leaving.) |
08:34
🔗
|
midas |
thanks SketchCow ! |
08:54
🔗
|
|
amerrykan has quit IRC (Quit: Quitting) |
09:26
🔗
|
|
APerti has quit IRC (Ping timeout: 480 seconds) |
09:27
🔗
|
|
amerrykan has joined #archiveteam |
09:29
🔗
|
|
primus104 has joined #archiveteam |
09:36
🔗
|
|
primus104 has quit IRC (Leaving.) |
10:41
🔗
|
|
antomati_ is now known as antomatic |
10:48
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
10:55
🔗
|
|
dashcloud has joined #archiveteam |
11:26
🔗
|
|
Ymgve has joined #archiveteam |
11:38
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
12:21
🔗
|
|
schbirid has joined #archiveteam |
12:21
🔗
|
|
Emcy_ has quit IRC (Read error: Connection reset by peer) |
12:55
🔗
|
|
cf has quit IRC (cf) |
13:11
🔗
|
|
Morbus has quit IRC (Quit: http://www.disobey.com/) |
13:14
🔗
|
|
Morbus has joined #archiveteam |
13:16
🔗
|
|
ruukasu has joined #archiveteam |
13:33
🔗
|
|
useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
Jogie has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
w0rp has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
Sellyme has quit IRC (ircd.shaw.ca irc.shaw.ca) |
13:33
🔗
|
|
w0rp_ has joined #archiveteam |
13:34
🔗
|
|
sankin has joined #archiveteam |
13:34
🔗
|
|
Sellyme has joined #archiveteam |
13:34
🔗
|
|
SadDM has joined #archiveteam |
13:35
🔗
|
|
rduser has joined #archiveteam |
13:42
🔗
|
|
primus104 has joined #archiveteam |
13:48
🔗
|
|
w0rp_ is now known as w0rp |
13:49
🔗
|
|
sankin has quit IRC (Leaving.) |
13:49
🔗
|
|
useretail has joined #archiveteam |
14:00
🔗
|
|
sankin has joined #archiveteam |
14:02
🔗
|
|
ruukasu has quit IRC (Quit: WeeChat 1.0.1) |
14:07
🔗
|
|
ruukasu has joined #archiveteam |
14:22
🔗
|
|
ruukasuu has joined #archiveteam |
14:22
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
14:23
🔗
|
|
ruukasuu has quit IRC (Client Quit) |
14:37
🔗
|
|
REiN^ has quit IRC () |
14:38
🔗
|
|
REiN^ has joined #archiveteam |
14:57
🔗
|
|
BiggieJo1 is now known as BiggieJon |
15:19
🔗
|
|
StartAway is now known as Start |
15:24
🔗
|
|
BiggieJon has left |
15:26
🔗
|
|
cf has joined #archiveteam |
15:34
🔗
|
|
mistym has joined #archiveteam |
15:34
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
15:39
🔗
|
|
BiggieJon has joined #archiveteam |
15:43
🔗
|
|
primus104 has quit IRC (Leaving.) |
15:44
🔗
|
|
Start has quit IRC (Remote host closed the connection) |
15:55
🔗
|
|
mistym has joined #archiveteam |
15:56
🔗
|
|
aaaaaaaaa has joined #archiveteam |
16:00
🔗
|
schbirid |
privat.t-online.de has a lot of personal homepages, no idea how to discover them all though |
16:00
🔗
|
midas |
google site:privat.t-online.de ? |
16:02
🔗
|
schbirid |
yeah but google does not let one paginate anymore after ~25 or something |
16:03
🔗
|
arkiver |
Google will give you the number of found links, like 1 million, but will only allow you to view 1000 |
16:16
🔗
|
|
thechip has quit IRC (Read error: Connection reset by peer) |
16:18
🔗
|
|
Emcy has joined #archiveteam |
16:23
🔗
|
|
chipper_ has joined #archiveteam |
16:24
🔗
|
|
chipper_ has left |
16:34
🔗
|
SadDM |
SketchCow: can you move https://archive.org/details/DarkHorseComicsMessageBoards-FinalGrab into the archive team colloection when you have a moment? |
16:37
🔗
|
SketchCow |
Done |
16:50
🔗
|
|
ruukasu has joined #archiveteam |
16:55
🔗
|
SketchCow |
Tripod.com is going down |
16:56
🔗
|
arkiver |
tripod.com |
16:56
🔗
|
SketchCow |
Maybe |
16:58
🔗
|
xmc |
! ? |
16:59
🔗
|
|
Start_ has joined #archiveteam |
16:59
🔗
|
arkiver |
Sites aren't hard to save, problem is the discovery of the sites that exist. http://196thovi.tripod.com/ |
16:59
🔗
|
xmc |
somewhere between 25-jun-2014 and today they got rid of <http://team-blog.tripod.com/>, but still link it from the front page |
17:00
🔗
|
xmc |
http://web.archive.org/web/20140625035208/http://team-blog.tripod.com/ |
17:00
🔗
|
arkiver |
we have various sources (wayback, google, etc.) |
17:00
🔗
|
arkiver |
but those will most likely not get everything. The wayback just doesn't have all websites and google only shows the first 1000 results |
17:01
🔗
|
DFJustin |
there's also searching wayback for the old url, http://members.tripod.com/* |
17:01
🔗
|
chfoo |
http://urlsearch.commoncrawl.org/?q=tripod.com |
17:01
🔗
|
arkiver |
yeah, I mentioned that |
17:02
🔗
|
arkiver |
I mean the wayback |
17:02
🔗
|
arkiver |
not the commoncrawl yet |
17:02
🔗
|
arkiver |
SketchCow: if you are not able to get a full list of websites some way (they might have some hidden index on their site?), would you like to contact them about this? |
17:02
🔗
|
chfoo |
we can do a discovery scraping google/bing with a dictionary if that's needed |
17:03
🔗
|
arkiver |
a dictionary on google? |
17:03
🔗
|
chfoo |
a word list i mean |
17:04
🔗
|
arkiver |
Like: site:*.tripod.com *aaa* |
17:04
🔗
|
arkiver |
site:*.tripod.com *the* etc.? |
17:05
🔗
|
DFJustin |
hmm it's not accepting my tripod password |
17:06
🔗
|
Start_ |
should we start a project for http://ep1c.com? |
17:06
🔗
|
Start_ |
it's also owned by viddy and shutting down on the same date (dec. 15) |
17:07
🔗
|
arkiver |
Start_: yep, I saw your posts about it (sorry for not responding) |
17:07
🔗
|
DFJustin |
reset works though |
17:09
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:10
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
17:15
🔗
|
|
dashcloud has joined #archiveteam |
17:22
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
17:23
🔗
|
schbirid |
not tripod :(( |
17:24
🔗
|
schbirid |
http://members.tripod.com/robots.txt has sitemaps |
17:24
🔗
|
|
Start_ is now known as Start |
17:25
🔗
|
schbirid |
whoever does this, please grab angelfire in the same go. same sitemap structure |
17:25
🔗
|
schbirid |
also please educate me how you do it, because i got stuck with angelfire and got no help |
17:26
🔗
|
arkiver |
thanks for those sitemaps |
17:26
🔗
|
arkiver |
I'll create a discovery project which will find all the sites using those sitemaps |
17:26
🔗
|
schbirid |
all URLs are inside the sitemaps |
17:27
🔗
|
arkiver |
yes |
17:28
🔗
|
schbirid |
just not the media embedded in those sites, that was my problem |
17:29
🔗
|
arkiver |
schbirid: do you have an example for me? |
17:29
🔗
|
arkiver |
and were you using wget lua? |
17:29
🔗
|
schbirid |
http://members.tripod.com/a1modularhomes/sitemap.xml random |
17:29
🔗
|
schbirid |
nope |
17:29
🔗
|
schbirid |
i gave up because i would have made a mess |
17:29
🔗
|
|
lbft has quit IRC (Read error: Operation timed out) |
17:30
🔗
|
arkiver |
do you mean by "the media embedded in those sites" external pictures and videos? |
17:30
🔗
|
schbirid |
the sitemaps only have html pages |
17:30
🔗
|
schbirid |
so any images etc need to be found |
17:30
🔗
|
arkiver |
I see what you mean now, sorry |
17:30
🔗
|
schbirid |
:) |
17:30
🔗
|
arkiver |
Yeah, I'll get those done by wget lua |
17:31
🔗
|
aaaaaaaaa |
maybe there should be a tripod channel |
17:31
🔗
|
arkiver |
Maybe it's not going to be a discovery project btw |
17:31
🔗
|
arkiver |
but we'll see |
17:32
🔗
|
garyrh |
#wobbly ? |
17:33
🔗
|
arkiver |
SketchCow: do we have the shutdown date? |
17:33
🔗
|
|
lbft has joined #archiveteam |
17:34
🔗
|
SketchCow |
No, and there's a chance this tip may have just come from someone finding what you did - the site seems really on the rack, blog no longer works, etc. |
17:34
🔗
|
|
mistym has joined #archiveteam |
17:38
🔗
|
aaaaaaaaa |
#byepod is what I was thinking |
17:43
🔗
|
|
philpem has joined #archiveteam |
17:50
🔗
|
|
Start has quit IRC (Ping timeout: 272 seconds) |
18:05
🔗
|
|
Jogie has joined #archiveteam |
18:09
🔗
|
|
APerti has joined #archiveteam |
18:10
🔗
|
|
rejon has quit IRC (Ping timeout: 480 seconds) |
18:35
🔗
|
|
cf_ has joined #archiveteam |
18:36
🔗
|
|
cf has quit IRC (Ping timeout: 246 seconds) |
18:36
🔗
|
|
cf_ is now known as cf |
18:39
🔗
|
|
primus104 has joined #archiveteam |
18:43
🔗
|
|
thechip has joined #archiveteam |
18:56
🔗
|
|
thechip has quit IRC (Read error: Operation timed out) |
19:03
🔗
|
arkiver |
SketchCow: ok if I wait till there is more information on the shutdown before I get the scripts ready? |
19:04
🔗
|
|
Sk2d has joined #archiveteam |
19:04
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
19:04
🔗
|
|
Sk2d is now known as Sk1d |
19:11
🔗
|
|
Sk1d has quit IRC (Ping timeout: 265 seconds) |
19:11
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:14
🔗
|
|
Sk1d has joined #archiveteam |
19:21
🔗
|
|
dashcloud has joined #archiveteam |
19:23
🔗
|
|
primus104 has quit IRC (Leaving.) |
19:30
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
19:31
🔗
|
|
dashcloud has joined #archiveteam |
19:32
🔗
|
arkiver |
midas: http://dat.serveert.me.uk/p/ftp |
19:32
🔗
|
arkiver |
is currently down :/ |
19:34
🔗
|
SketchCow |
Yes, please do. |
19:35
🔗
|
arkiver |
ok |
19:42
🔗
|
|
Start has joined #archiveteam |
19:43
🔗
|
|
ruukasu has joined #archiveteam |
19:46
🔗
|
|
bauruine has quit IRC (Ping timeout: 265 seconds) |
19:48
🔗
|
|
philpem has quit IRC (Ping timeout: 272 seconds) |
19:51
🔗
|
|
bauruine has joined #archiveteam |
20:02
🔗
|
Start |
https://roon.io |
20:02
🔗
|
Start |
http://blog.ghost.org/roon/ |
20:02
🔗
|
Start |
"The Roon.io hosted platform will be closing its doors on December 31st, 2014." |
20:03
🔗
|
|
Kniffy has quit IRC (Quit: pup) |
20:03
🔗
|
|
thechip has joined #archiveteam |
20:05
🔗
|
|
Kniffy has joined #archiveteam |
20:08
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
20:16
🔗
|
Start |
here's a google crawl for roon: http://paste.archivingyoursh.it/goxowihalo.avrasm |
20:19
🔗
|
|
SN4T14 has quit IRC (Ping timeout: 369 seconds) |
20:19
🔗
|
Start |
looks like roon can be sequentially scraped through its api: https://roon.io/developer/blogs |
20:28
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
20:32
🔗
|
|
Start has joined #archiveteam |
20:34
🔗
|
|
primus104 has joined #archiveteam |
20:37
🔗
|
arkiver |
SketchCow: a fast and small project is starting very soon: ziplist |
20:37
🔗
|
arkiver |
#zipyourlips |
20:37
🔗
|
|
ex-parro1 has joined #archiveteam |
20:37
🔗
|
arkiver |
That one is going to FOS, currently 30.000 warc's |
20:42
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
20:43
🔗
|
Start |
cf: since you've been doing API scrapes for a couple recent projects, mind doing one for roon? |
20:44
🔗
|
Start |
http://archiveteam.org/index.php?title=Roon |
20:44
🔗
|
cf |
Start: I’ll have a go at it. Not sure when I’ll get around to it, but within a week or so |
20:45
🔗
|
arkiver |
Start: are those api's just incremental numbered? |
20:45
🔗
|
Start |
yes |
20:45
🔗
|
|
dashcloud has joined #archiveteam |
20:45
🔗
|
arkiver |
then I'll do them in the scripts, we also save the api urls that way |
20:45
🔗
|
Start |
ok |
20:45
🔗
|
cf |
Yea, just about to say |
20:46
🔗
|
Start |
we need an irc channel name for roon |
20:46
🔗
|
Start |
#rooin |
20:48
🔗
|
Start |
or maybe #rooined |
20:49
🔗
|
Start |
i like rooined better |
21:01
🔗
|
|
T31M has quit IRC (Quit: Leaving) |
21:09
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
21:09
🔗
|
|
aaaaaaaaa has joined #archiveteam |
21:25
🔗
|
|
cf has quit IRC (Ping timeout: 265 seconds) |
21:26
🔗
|
|
Start_ has joined #archiveteam |
21:27
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
21:28
🔗
|
midas |
arkiver: ill fix it in a minute |
21:36
🔗
|
midas |
fixed |
21:36
🔗
|
midas |
forgot it rebooted this box |
21:36
🔗
|
|
bauruine has quit IRC (Ping timeout: 265 seconds) |
21:36
🔗
|
|
K4k has joined #archiveteam |
21:38
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
21:42
🔗
|
|
bauruine has joined #archiveteam |
21:54
🔗
|
arkiver |
thanks midas |
21:56
🔗
|
|
SN4T14 has joined #archiveteam |
21:57
🔗
|
|
sankin has quit IRC (Leaving.) |
22:00
🔗
|
|
hive-mind has quit IRC (Ping timeout: 272 seconds) |
22:07
🔗
|
|
ruukasu has joined #archiveteam |
22:09
🔗
|
|
cbb has joined #archiveteam |
22:11
🔗
|
|
thechip has quit IRC (Quit: Leaving...) |
22:12
🔗
|
|
hive-mind has joined #archiveteam |
22:14
🔗
|
|
Start_ is now known as Start |
22:17
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
22:20
🔗
|
|
dashcloud has joined #archiveteam |
22:22
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
22:24
🔗
|
arkiver |
SketchCow: ziplist should be incoming on FOS now, 30.000 warc's |
22:24
🔗
|
|
Start has joined #archiveteam |
22:25
🔗
|
|
K4k has quit IRC (Ping timeout: 378 seconds) |
22:25
🔗
|
|
schbirid has quit IRC (Leaving) |
22:27
🔗
|
|
cf has joined #archiveteam |
22:40
🔗
|
|
REiN^ has quit IRC (Read error: Connection reset by peer) |
22:41
🔗
|
|
REiN^ has joined #archiveteam |
22:46
🔗
|
Start |
arkiver: the highest valid roon blog i could find was: https://roon.io/api/v1/blogs/122233 |
22:47
🔗
|
|
REiN^ has quit IRC (Read error: Connection reset by peer) |
22:47
🔗
|
arkiver |
Start: thanks, but first ep1c |
22:47
🔗
|
arkiver |
Tomorrow is ep1c day |
22:47
🔗
|
Start |
ok |
22:48
🔗
|
Start |
i'm guessing that ep1c's grab scripts will be very similar to viddy's? |
22:48
🔗
|
|
REiN^ has joined #archiveteam |
22:49
🔗
|
|
signius_ has quit IRC (Read error: Operation timed out) |
22:50
🔗
|
arkiver |
probably, but I'll see that tomorrow |
23:02
🔗
|
|
signius_ has joined #archiveteam |
23:07
🔗
|
|
ex-parro1 has quit IRC (Remote host closed the connection) |
23:07
🔗
|
dashcloud |
so tripod really is going down? |
23:08
🔗
|
|
Start has quit IRC (Ping timeout: 378 seconds) |
23:08
🔗
|
garyrh |
maaaybe |
23:08
🔗
|
dashcloud |
might as well grab angelfire while we're at it- you'd then have the three big players from the 90s |
23:10
🔗
|
xmc |
does lycos still host homepages? |
23:11
🔗
|
|
REiN^ has quit IRC (Read error: Operation timed out) |
23:15
🔗
|
dashcloud |
yeah- there's classic 90s Angelfire:http://www.angelfire.com/sd/ScrewAOL/ and the modern Angelfire: http://www.angelfire.lycos.com/ |
23:17
🔗
|
|
ex-parro1 has joined #archiveteam |
23:26
🔗
|
dashcloud |
so modern angelfire is probably not too hard to archive because there's sitemaps listing the pages: http://www.angelfire.com/sitemap-index-00.xml.gz |
23:26
🔗
|
dashcloud |
I started that project but when wget got killed because of memkiller, I stopped |
23:50
🔗
|
|
cf has quit IRC (Quit: cf) |
23:57
🔗
|
godane |
so some of the KBS News Today i got are incomplete |
23:58
🔗
|
godane |
doing a 2nd rtmpdump gets me a bigger file |
23:58
🔗
|
godane |
i must have been doing too many at once |