Time |
Nickname |
Message |
00:03
π
|
|
rolfb has joined #archiveteam |
00:03
π
|
rolfb |
SketchCow: around? :-) |
00:06
π
|
|
rolfb has quit IRC (Linkinus - http://linkinus.com) |
00:37
π
|
|
schbirid2 has quit IRC (Quit: Leaving) |
00:39
π
|
|
primus104 has quit IRC (Leaving.) |
01:14
π
|
|
mistym has quit IRC (Remote host closed the connection) |
01:23
π
|
|
ehea617 has quit IRC (Quit: Page closed) |
01:40
π
|
|
WubTheCap has joined #archiveteam |
01:40
π
|
WubTheCap |
8chan is going to delete a lot of stuff soon, if that matters. https://twitter.com/infinitechan/status/574345499832029184 |
02:30
π
|
chfoo |
there was a paranoia grab in january: http://archive.fart.website/archivebot/viewer/job/1xnyf |
02:30
π
|
chfoo |
almost 300GB of data |
02:32
π
|
chfoo |
run it in archivebot again for a bit to grab recent things maybe? |
02:33
π
|
WubTheCap |
It's the redeem board page, let me find it |
02:33
π
|
WubTheCap |
03:36:33 @WubTheCaptain | copypaste: I don't like you going destroying boards like that. reddit doesn't delete inactive boards. At least make a dump and upload it to archive.org |
02:33
π
|
WubTheCap |
03:37:01 @WubTheCaptain | Also, https://twitter.com/infinitechan/status/574345499832029184 was vague for me, all three conditions required or one condition only? |
02:33
π
|
WubTheCap |
03:42:53 +copypaste | all 3 |
02:35
π
|
WubTheCap |
I can't seem to find it now |
02:35
π
|
WubTheCap |
Here it is. https://8ch.net/claim.html |
02:35
π
|
WubTheCap |
A list of the boards, at least |
02:39
π
|
chfoo |
i'll throw the subdirectories into archivebot in a few hours if no one does it by then |
02:40
π
|
|
okeuday has joined #archiveteam |
02:40
π
|
WubTheCap |
67-68 hours remaining |
02:49
π
|
|
Froggypwn has quit IRC (Ping timeout: 512 seconds) |
02:50
π
|
|
Froggypwn has joined #archiveteam |
02:55
π
|
|
nwf has quit IRC (WeeChat 1.0.1) |
03:26
π
|
|
nwf has joined #archiveteam |
03:28
π
|
|
Ymgve has quit IRC () |
04:01
π
|
|
khaoohs_ has joined #archiveteam |
04:04
π
|
|
oli has quit IRC (Read error: Operation timed out) |
04:04
π
|
|
T31m_ has joined #archiveteam |
04:04
π
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
04:05
π
|
|
oli has joined #archiveteam |
04:09
π
|
|
T31M has quit IRC (Read error: Operation timed out) |
04:15
π
|
|
techapj has joined #archiveteam |
04:16
π
|
techapj |
Hello Archive team! |
04:17
π
|
techapj |
I need help with installation of wget-warc-lua on Ubuntu 14.04 server |
04:17
π
|
techapj |
I am an intern at Discourse and I have been assigned a task to archive fairly large vBulletin forum. I found your excellent vBulletin archive script: https://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua |
04:18
π
|
techapj |
That script is exactly what I need for archiving vBulletin forum, but it needs wget-warc-lua installed/complied on system. I tried compiling it via: https://github.com/ArchiveTeam/tabblo-grab/blob/master/get-wget-warc-lua.sh |
04:19
π
|
chfoo |
try using a recent build script from here for example: https://github.com/ArchiveTeam/testflight-grab |
04:19
π
|
|
test_ has joined #archiveteam |
04:20
π
|
chfoo |
the one from 2012 is too old |
04:21
π
|
techapj |
@chfoo thanks for the help, trying to compile from that right now |
04:23
π
|
techapj |
the build is failing |
04:23
π
|
techapj |
checking lua.h usability... no checking lua.h presence... no checking for lua.h... no checking lua5.1/lua.h usability... no checking lua5.1/lua.h presence... no checking for lua5.1/lua.h... no configure: error: lua not found wget-lua not successfully built. |
04:24
π
|
techapj |
I installed lua on Ubuntu 14.04 |
04:24
π
|
techapj |
~/testflight-grab# lua -v Lua 5.2.3 Copyright (C) 1994-2013 Lua.org, PUC-Rio root@wget-gearbox:~/testflight-grab# |
04:25
π
|
chfoo |
you need libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev bzip2 zlib1g-dev |
04:26
π
|
techapj |
ok, just installed these dependencies, trying building again |
04:31
π
|
techapj |
now getting error: |
04:31
π
|
techapj |
POD document had syntax errors at /usr/bin/pod2man line 71. make[2]: *** [wget.1] Error 255 |
04:33
π
|
chfoo |
the bit near the bottom the readme should fix that |
04:34
π
|
chfoo |
wget-lua should already be built |
04:34
π
|
techapj |
oh sorry, should have read the README :) |
04:35
π
|
techapj |
thanks a lot for your help @chfoo! you are awesome ;) |
04:37
π
|
|
Kenshin has quit IRC (Ping timeout: 246 seconds) |
04:38
π
|
chfoo |
no problem |
04:41
π
|
techapj |
@chfoo i cannot find wget file in /get-wget-lua.tmp/src |
04:41
π
|
techapj |
i can see wget.h but not wget |
04:41
π
|
|
Kenshin has joined #archiveteam |
04:45
π
|
chfoo |
techapj: i guess you have a different error. could you see if you can make this change at https://github.com/ArchiveTeam/wget-lua/commit/5d7348c0d047331539ac38e64fdb53bb5e52aae4 and avoid trying to build the doc |
04:46
π
|
xmc |
techapj: welcome to archiveteam! (btw, typical way to mention people on irc is like i just did here, not with an @-sign.) |
04:46
π
|
xmc |
glad to hear someone else is tackling The Forums Problem :) |
04:47
π
|
techapj |
thanks xmc, this is my first time chatting in irc :) |
04:47
π
|
techapj |
xmc: vBulletin fourm is a nightmare to archive |
04:47
π
|
xmc |
i guessed as much ... using @ to talk to people is a pretty strong indicator |
04:47
π
|
xmc |
yes. yes it is. |
04:48
π
|
xmc |
i am slowly working on a semi-secret project to archive all the forums. |
04:48
π
|
techapj |
the vbulletin.lua script is the only hope i have to archive a fairly large forum |
04:49
π
|
xmc |
care to say what forum it is? |
04:49
π
|
techapj |
oldforums.gearboxsoftware.com |
04:49
π
|
techapj |
they are now our client, moved to Discourse: http://forums.gearboxsoftware.com/ |
04:50
π
|
xmc |
ah that's fairly large then |
04:50
π
|
xmc |
i have a 1/4-working vbulletin crawler, abandoned dev on that a few years ago because it became unmaintainable |
04:51
π
|
xmc |
i was trying to make it resilient to variation across themes, which as it turns out is an AI-hard problem |
04:51
π
|
techapj |
is vubulletin.lua script reliable for gearbox vB archive? |
04:52
π
|
techapj |
to be honest, i have doubts |
04:52
π
|
xmc |
beats me. it will get all the things, if it doesn't explode in the process |
04:52
π
|
techapj |
i have spent more than 4 days struggling with pure wget, gave up at last :( |
04:52
π
|
xmc |
i see that they use normal URLs |
04:52
π
|
xmc |
yes, pure wget on a forum will explode quickly |
04:53
π
|
xmc |
are you using wget or wpull? |
04:53
π
|
techapj |
wget 1.16.2 |
04:53
π
|
xmc |
ah, wget. |
04:54
π
|
xmc |
so if wget doesn't work for you, you may get better results with wpull. |
04:54
π
|
xmc |
iirc wpull is more suited to be a crawler than wget, for a few reasons |
04:54
π
|
xmc |
such as it keeps its queue of urls in (iirc) a sqlite file, rather than ram |
04:54
π
|
techapj |
ah, https://github.com/chfoo/wpull |
04:55
π
|
techapj |
made by chfoo :) |
04:55
π
|
xmc |
:) |
04:56
π
|
techapj |
so would you recommend that instead of using vbulletin.lua? |
04:57
π
|
xmc |
vbulletin.lua has treated me well in the past |
04:57
π
|
techapj |
the main problem with gearbox forum is: |
04:57
π
|
techapj |
Threads: 348,575, Posts: 4,879,823, Members: 519,365 |
04:57
π
|
xmc |
if the job turns out to be too big for wget, though, you might want to look into getting vbulletin.lua to run with wpull instead of wget |
04:57
π
|
xmc |
yes |
04:57
π
|
xmc |
it is a not small forum |
04:58
π
|
techapj |
i tried doing: `wget --mirror --adjust-extension --no-clobber --convert-links --random-wait --no-parent --page-requisites robots=off -U mozilla http://oldforums.gearboxsoftware.com/` |
04:58
π
|
techapj |
but it downloaded over 20GB of data, ran infinite loop |
04:59
π
|
xmc |
using a crawler directly on a blog site, without a script to tell it what to ignore, will result in an unsatisfactory output |
04:59
π
|
xmc |
please commit that sentence to memory |
04:59
π
|
xmc |
er |
04:59
π
|
xmc |
s/blog site/forum/ |
04:59
π
|
xmc |
sorry, it's 21:00 on saturday night and i just had my first coffee |
05:00
π
|
xmc |
to crawl a forum satisfactorily, you need one of two things: |
05:00
π
|
xmc |
1} |
05:00
π
|
techapj |
xmc: no problem, thanks for your help and recommendations |
05:00
π
|
xmc |
1} a large set of regular expressions to ignore urls |
05:00
π
|
techapj |
i really apprecite it :) |
05:00
π
|
xmc |
2} a script to drive the crawler |
05:00
π
|
xmc |
sure thing |
05:00
π
|
techapj |
i really appreciate it :) |
05:01
π
|
xmc |
my pleasure |
05:01
π
|
techapj |
and still learning how to chat in irc ;) |
05:01
π
|
chfoo |
oh, there's this too https://github.com/ludios/grab-site if you want to ignore urls on the fly |
05:02
π
|
techapj |
chfoo: thanks! will look into it |
05:07
π
|
techapj |
chfoo: I am a wget beginner and never archived a site/forum/blog before. What would you recommend me for archiving a large vB forum? Until now i have figured out options like vbulletin.lua, wpull. but what would you recommend? |
05:11
π
|
chfoo |
techapj: i guess you could take a look at running heritrix (i never used it before) or perhaps try setting up archiveteam's archivebot (given that you change the user-agent and other hardcoded settings) |
05:16
π
|
techapj |
chfoo: ok, will look into it. thanks for your help :) |
05:34
π
|
fenn |
since you work with the company hosting the forum, can't you just ask for a database dump? |
05:36
π
|
techapj |
fenn: we have database dump, but we want to convert the vB forum to read only static HTML archive |
05:37
π
|
techapj |
we have already imported the old data in new Discourse forum |
05:37
π
|
techapj |
but gearbox wants to host a read only static HTML of old forum |
05:52
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:54
π
|
|
dashcloud has joined #archiveteam |
05:57
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:05
π
|
|
dashcloud has joined #archiveteam |
06:11
π
|
xmc |
i'd like to move the warrior tracker machine (shilling) to a different hosting provider before the end of march |
06:12
π
|
xmc |
headsup, chfoo Smiley yipdw GLaDOS arkiver underscor |
06:15
π
|
yipdw |
cool np |
06:15
π
|
yipdw |
FWIW, DigitalOcean seems to do ok |
06:15
π
|
xmc |
oh, i have a provider in mind |
06:15
π
|
yipdw |
ah |
06:16
π
|
xmc |
i'm a part owner in http://vpssd.com/ so it'll be $40 a month of actual savings for me |
06:16
π
|
yipdw |
is there an ArchiveTeam discount |
06:17
π
|
xmc |
lol |
06:17
π
|
xmc |
it's a money-losing business venture already, why would we give out discounts |
06:18
π
|
Ctrl-S |
how is it losing money? |
06:18
π
|
WubTheCap |
DigitalOcean is ready to kick you out on first abuse notices, even if they're not valid. |
06:19
π
|
WubTheCap |
Also, >clown >no OpenBSD ISOs |
06:19
π
|
xmc |
Ctrl-S: we don't have enough paying customers to pay all of the colo bills. |
06:19
π
|
xmc |
basically. |
06:20
π
|
WubTheCap |
I'm getting a Portlane colo next week. |
06:21
π
|
WubTheCap |
xmc: SVColo is quite an unusual data center choice. |
06:21
π
|
WubTheCap |
For what the webserver is hosted on, at least |
06:21
π
|
xmc |
-> #archiveteam-bs |
06:26
π
|
xmc |
anyway, people who have shit on that box ( chfoo Smiley yipdw GLaDOS arkiver underscor ), ping me if i need to do something other than rsync it over |
06:27
π
|
yipdw |
xmc: it should be fine, just let me or chfoo know before you shut it down so we can get redis to properly shut down and write |
06:28
π
|
xmc |
def. i think i'll do it next weekend, unless we're in the middle of a firedrill. |
06:28
π
|
yipdw |
ok |
06:28
π
|
xmc |
turn off all the services, point the dns to the new box, rsync everything over, turn on services on new box ... ? |
06:29
π
|
yipdw |
provided it's an rsync from / I don't see a reason why it wouldn't work |
06:29
π
|
* |
xmc nods |
06:29
π
|
xmc |
sounds good |
06:30
π
|
xmc |
/home/tinytown has about 40G |
06:30
π
|
xmc |
oof |
06:30
π
|
xmc |
:P |
06:31
π
|
xmc |
the box is previous version of debian, how much of a little hell would it be to go from deb 6 to deb 7 on the new box? |
06:31
π
|
xmc |
and not a full / rsync |
06:32
π
|
WubTheCap |
Depends on your partitioning and the RNG |
06:32
π
|
xmc |
? |
06:33
π
|
xmc |
i'm also a fan of occasionally redeploying machines from scratch, to head off bitrot |
06:33
π
|
yipdw |
we could just see what happens |
06:33
π
|
yipdw |
unless you're planning to turn off the old tracker host immediately |
06:33
π
|
xmc |
that's the spirit! |
06:34
π
|
yipdw |
it's probably also the only sane approach :P |
06:34
π
|
yipdw |
everything else gets lost in HN-style what-if fappage |
06:34
π
|
xmc |
nah I can keep it up until the end of the month |
06:35
π
|
xmc |
http://xrtc.net/f/pixen/we-stop-bit-rot.png |
06:35
π
|
yipdw |
i should eventually do the same for archivebot's control host |
06:36
π
|
yipdw |
it's like on Ubuntu pwned.04 |
06:36
π
|
yipdw |
LTS |
06:36
π
|
xmc |
hahaha |
06:36
π
|
xmc |
long term suckage |
06:40
π
|
|
Sk1d has quit IRC (Ping timeout: 265 seconds) |
06:48
π
|
|
edward_ has joined #archiveteam |
07:22
π
|
|
signius has quit IRC (Ping timeout: 306 seconds) |
07:34
π
|
|
signius has joined #archiveteam |
07:43
π
|
|
X-Scale has joined #archiveteam |
07:43
π
|
|
primus104 has joined #archiveteam |
07:49
π
|
|
mistym has joined #archiveteam |
08:38
π
|
|
nertzy has joined #archiveteam |
08:39
π
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
08:40
π
|
|
nertzy2 has quit IRC (Read error: Operation timed out) |
08:40
π
|
|
codinghor has joined #archiveteam |
08:40
π
|
codinghor |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
08:42
π
|
codinghor |
this is a very unfortunate substring length for my handle |
08:43
π
|
|
dashcloud has joined #archiveteam |
08:44
π
|
DFJustin |
yahoosucks |
08:45
π
|
codinghor |
that worked thanks |
08:47
π
|
|
nertzy has quit IRC (Read error: Operation timed out) |
08:49
π
|
|
edward_ has quit IRC (Ping timeout: 512 seconds) |
09:05
π
|
|
lag has joined #archiveteam |
09:19
π
|
xmc |
codinghor: yeah. you might want to trim the other side: inghorror |
09:20
π
|
xmc |
though i'm fond of dinghorro |
09:32
π
|
|
mistym has quit IRC (Remote host closed the connection) |
09:53
π
|
|
primus104 has quit IRC (Leaving.) |
10:08
π
|
codinghor |
in the future, people will be able to use handles of MORE than 10 characters! Madness. |
10:13
π
|
xmc |
sheeeit, we're still stuck at 9 over here |
10:25
π
|
Sanqui |
cdnghrrr, and you get one spare |
10:25
π
|
Smiley |
xmc: kill all my stuff |
10:26
π
|
xmc |
nuke the account and everything in it? |
10:26
π
|
Smiley |
yup |
10:26
π
|
xmc |
ok |
10:26
π
|
Sanqui |
what |
10:26
π
|
xmc |
there are no files in your ~ anyway |
10:26
π
|
Smiley |
nod |
10:26
π
|
Sanqui |
we don't nuke things over here |
10:27
π
|
Smiley |
not sure i even used that one? |
10:27
π
|
xmc |
ok fine i will comment out the line in /etc/passwd and carefully move the ~ to somewhere in /var/backup |
10:27
π
|
Sanqui |
excellent |
10:27
π
|
Smiley |
lol k |
10:28
π
|
Sanqui |
"it's empty" is also data |
10:28
π
|
xmc |
just kidding |
10:28
π
|
xmc |
deluser is fine |
10:28
π
|
xmc |
this conversation records that it is empty |
10:29
π
|
xmc |
oh hey there are hidden history files |
10:29
π
|
xmc |
how the hell did you view redtube on this box |
10:29
π
|
Sanqui |
lmao |
10:29
π
|
xmc |
and why is it in your redis_cli history |
10:30
π
|
xmc |
actually, switch the "how" and "why" in those two sentences |
10:31
π
|
Sanqui |
[silence] |
10:45
π
|
|
nertzy has joined #archiveteam |
11:06
π
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
11:32
π
|
|
schbirid has joined #archiveteam |
11:42
π
|
|
primus104 has joined #archiveteam |
11:58
π
|
|
Sellyme_ has quit IRC (Ping timeout: 265 seconds) |
11:58
π
|
|
Sellyme has joined #archiveteam |
12:03
π
|
|
Ymgve has joined #archiveteam |
12:26
π
|
|
primus104 has quit IRC (Read error: Connection reset by peer) |
12:31
π
|
|
primus104 has joined #archiveteam |
12:41
π
|
|
codinghor has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) |
12:44
π
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
12:45
π
|
|
dashcloud has joined #archiveteam |
12:47
π
|
|
RuairiCOL has joined #archiveteam |
12:48
π
|
RuairiCOL |
Hello all - a site I'm a fan of is at risk because its file download server has gone offline - I have a copy of the entire folder structure up until November 2014 and I'm uploading it to my own server for the moment, but I'm wondering if archiveteam have somewhere I can store it in addition just in case |
12:48
π
|
RuairiCOL |
The site is hybridized.org and contains a load of DJ sets |
12:49
π
|
RuairiCOL |
I've got around 190GB to upload |
12:49
π
|
RuairiCOL |
SketchCow: *boop* I'm @rc55 from Twitter and the demoscene, I also run a small demoparty called Sundown :) |
12:50
π
|
Ctrl-S |
Internet archive will want it |
12:50
π
|
schbirid |
definitely IA material imo |
12:53
π
|
RuairiCOL |
cool - the upload tool is a bit of a blunt instrument, is there any way I can ftp up the stuff considering the size? There are also cue files that have useful metadata |
12:56
π
|
Smiley |
errr |
12:56
π
|
Smiley |
i wasn't viewing redtube :D |
12:56
π
|
Smiley |
we may of looked at archiving it D: |
12:58
π
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
12:59
π
|
|
dashcloud has joined #archiveteam |
13:02
π
|
Ctrl-S |
wht not just zip it all? |
13:02
π
|
Ctrl-S |
site.tar.gs or site.zip |
13:11
π
|
RuairiCOL |
I'll work on that - I'll get this first upload done then upload it to archive.org next week (192gb at 512k p/sec...) |
13:17
π
|
Ctrl-S |
ask the others before you bother doing anything |
13:17
π
|
Ctrl-S |
i'm not a good source of advice |
13:27
π
|
Kazzy |
RuairiCOL: archive.org does have a way of uploading as a torrent, which sounds like it's easier for you http://archiveteam.org/index.php?title=Internet_Archive#Uploading_to_archive.org |
13:53
π
|
schbirid |
dont zip things like media files |
13:54
π
|
schbirid |
you can upload with curl, s3cmd or some python/perl tools |
13:55
π
|
Smiley |
and the iauploader script |
14:34
π
|
Sanqui |
Is there something other than the WaybackMachine I can check for removed/expired pastebins? |
14:37
π
|
trs80 |
that sounds like a good project for urlteam |
14:37
π
|
Ctrl-S |
pastebins? |
14:37
π
|
Ctrl-S |
i've got code for pastebin stuff |
14:37
π
|
Ctrl-S |
sort of |
14:38
π
|
Ctrl-S |
recursive download stuff |
14:38
π
|
Sanqui |
well, I'm looking for one specific paste |
14:38
π
|
Ctrl-S |
there's also a AT project for new pastes |
14:38
π
|
Ctrl-S |
why? |
14:38
π
|
Sanqui |
from 2013 |
14:38
π
|
Sanqui |
well, the paste "has been removed" and the author doesn't have it any more |
15:07
π
|
|
Emcy has quit IRC (Read error: Connection reset by peer) |
15:07
π
|
|
Emcy has joined #archiveteam |
15:18
π
|
|
ohhdemgir has quit IRC (Read error: Operation timed out) |
15:20
π
|
|
ohhdemgir has joined #archiveteam |
15:22
π
|
|
Emcy has quit IRC (Read error: Connection reset by peer) |
15:53
π
|
|
Emcy has joined #archiveteam |
16:00
π
|
chfoo |
http://blog.postach.io/post/brand-new-version-launched-billing-changes |
16:23
π
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
16:24
π
|
|
dashcloud has joined #archiveteam |
16:39
π
|
|
Emcy has quit IRC (Ping timeout: 512 seconds) |
16:48
π
|
|
mietek has joined #archiveteam |
16:51
π
|
mietek |
Does anyone happen to have an archived copy of ftp://ftp.cpsc.ucalgary.ca/pub/projects/charity/ ? |
16:51
π
|
mietek |
The web site is still online, but all the source/code links are dead: http://pll.cpsc.ucalgary.ca/charity1/www/home.html |
16:54
π
|
ats |
mietek: https://synrc.com/publications/cat/Functional%20Languages/Charity/ ? |
16:55
π
|
mietek |
ats: wow, cool. |
16:56
π
|
mietek |
ats as in the ATS language? |
16:57
π
|
ats |
no, ats as in my name ;-) |
16:57
π
|
mietek |
Thanks. Was that just a good Google search? |
16:58
π
|
ats |
yup; I searched for charity-src.tar.gz... |
16:58
π
|
mietek |
Thanks. |
17:55
π
|
|
mistym has joined #archiveteam |
17:57
π
|
|
dashcloud has quit IRC (Ping timeout: 370 seconds) |
17:57
π
|
|
dashcloud has joined #archiveteam |
18:02
π
|
|
Jonimus has quit IRC (Ping timeout: 260 seconds) |
18:07
π
|
|
rolfb has joined #archiveteam |
18:11
π
|
mietek |
Unfortunately the example programs seem not to have been mirrored anywhere.. |
18:12
π
|
mietek |
(http://pll.cpsc.ucalgary.ca/charity1/www/examples.html) |
18:14
π
|
balrog |
mietek: have you tried emailing them? |
18:14
π
|
mietek |
Not yet. Will do. |
18:15
π
|
mietek |
First collecting what I can before I start bothering people who apparently gave up on the thing 15 years ago. |
18:15
π
|
mietek |
Sorry if this isnβt the right channel for these questions. |
18:17
π
|
|
rolfb has quit IRC (Leaving...) |
18:20
π
|
mietek |
I was hoping thereβs some secret FTP analog of archive.org which someone here might know. |
18:22
π
|
balrog |
mietek: unfortunately there isn't. people have started for FTPs, but it's a bit late |
18:44
π
|
|
the_fox is now known as TheOtherF |
18:45
π
|
|
TheOtherF is now known as OtherFox |
18:57
π
|
|
lag2 has joined #archiveteam |
19:01
π
|
|
lag has quit IRC (Ping timeout: 512 seconds) |
19:12
π
|
|
robink has quit IRC (Quit: No Ping reply in 180 seconds.) |
19:13
π
|
Start |
is there any good way to scrape yahoo automatically? |
19:13
π
|
|
robink has joined #archiveteam |
19:13
π
|
Start |
i normally save search engine pages manually and extract urls with regex, but in the case of google business sitebuilder there's over 100 result pages per domain |
19:19
π
|
winr4r |
Start: yes, use the bing API |
19:35
π
|
xmc |
bling api |
19:36
π
|
fenn |
Sanqui: https://archive.org/details/pastebinpastes goes back to 2013-10-30 |
20:18
π
|
|
yan has joined #archiveteam |
20:50
π
|
|
bzc6p has joined #archiveteam |
20:51
π
|
bzc6p |
Start: http://archiveteam.org/index.php?title=Site_exploration |
20:57
π
|
|
mistym has quit IRC (Remote host closed the connection) |
21:13
π
|
|
bzc6p has left |
21:32
π
|
|
Emcy has joined #archiveteam |
21:37
π
|
|
schbirid has quit IRC (Quit: Leaving) |
21:49
π
|
|
Ravenloft has joined #archiveteam |
22:31
π
|
Start |
where are the nokia memories archives? i can't find them anywhere |
22:35
π
|
|
BlueMaxim has joined #archiveteam |
22:53
π
|
|
serapeum has joined #archiveteam |
22:56
π
|
|
Emcy_ has joined #archiveteam |
22:56
π
|
|
Emcy has quit IRC (Ping timeout: 306 seconds) |
22:58
π
|
|
lag2 has quit IRC (Read error: Operation timed out) |
23:52
π
|
|
signius has quit IRC (Ping timeout: 306 seconds) |