#archiveteam 2015-03-08,Sun

↑back Search

Time Nickname Message
00:03 πŸ”— rolfb has joined #archiveteam
00:03 πŸ”— rolfb SketchCow: around? :-)
00:06 πŸ”— rolfb has quit IRC (Linkinus - http://linkinus.com)
00:37 πŸ”— schbirid2 has quit IRC (Quit: Leaving)
00:39 πŸ”— primus104 has quit IRC (Leaving.)
01:14 πŸ”— mistym has quit IRC (Remote host closed the connection)
01:23 πŸ”— ehea617 has quit IRC (Quit: Page closed)
01:40 πŸ”— WubTheCap has joined #archiveteam
01:40 πŸ”— WubTheCap 8chan is going to delete a lot of stuff soon, if that matters. https://twitter.com/infinitechan/status/574345499832029184
02:30 πŸ”— chfoo there was a paranoia grab in january: http://archive.fart.website/archivebot/viewer/job/1xnyf
02:30 πŸ”— chfoo almost 300GB of data
02:32 πŸ”— chfoo run it in archivebot again for a bit to grab recent things maybe?
02:33 πŸ”— WubTheCap It's the redeem board page, let me find it
02:33 πŸ”— WubTheCap 03:36:33 @WubTheCaptain | copypaste: I don't like you going destroying boards like that. reddit doesn't delete inactive boards. At least make a dump and upload it to archive.org
02:33 πŸ”— WubTheCap 03:37:01 @WubTheCaptain | Also, https://twitter.com/infinitechan/status/574345499832029184 was vague for me, all three conditions required or one condition only?
02:33 πŸ”— WubTheCap 03:42:53 +copypaste | all 3
02:35 πŸ”— WubTheCap I can't seem to find it now
02:35 πŸ”— WubTheCap Here it is. https://8ch.net/claim.html
02:35 πŸ”— WubTheCap A list of the boards, at least
02:39 πŸ”— chfoo i'll throw the subdirectories into archivebot in a few hours if no one does it by then
02:40 πŸ”— okeuday has joined #archiveteam
02:40 πŸ”— WubTheCap 67-68 hours remaining
02:49 πŸ”— Froggypwn has quit IRC (Ping timeout: 512 seconds)
02:50 πŸ”— Froggypwn has joined #archiveteam
02:55 πŸ”— nwf has quit IRC (WeeChat 1.0.1)
03:26 πŸ”— nwf has joined #archiveteam
03:28 πŸ”— Ymgve has quit IRC ()
04:01 πŸ”— khaoohs_ has joined #archiveteam
04:04 πŸ”— oli has quit IRC (Read error: Operation timed out)
04:04 πŸ”— T31m_ has joined #archiveteam
04:04 πŸ”— khaoohs has quit IRC (Read error: Connection reset by peer)
04:05 πŸ”— oli has joined #archiveteam
04:09 πŸ”— T31M has quit IRC (Read error: Operation timed out)
04:15 πŸ”— techapj has joined #archiveteam
04:16 πŸ”— techapj Hello Archive team!
04:17 πŸ”— techapj I need help with installation of wget-warc-lua on Ubuntu 14.04 server
04:17 πŸ”— techapj I am an intern at Discourse and I have been assigned a task to archive fairly large vBulletin forum. I found your excellent vBulletin archive script: https://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua
04:18 πŸ”— techapj That script is exactly what I need for archiving vBulletin forum, but it needs wget-warc-lua installed/complied on system. I tried compiling it via: https://github.com/ArchiveTeam/tabblo-grab/blob/master/get-wget-warc-lua.sh
04:19 πŸ”— chfoo try using a recent build script from here for example: https://github.com/ArchiveTeam/testflight-grab
04:19 πŸ”— test_ has joined #archiveteam
04:20 πŸ”— chfoo the one from 2012 is too old
04:21 πŸ”— techapj @chfoo thanks for the help, trying to compile from that right now
04:23 πŸ”— techapj the build is failing
04:23 πŸ”— techapj checking lua.h usability... no checking lua.h presence... no checking for lua.h... no checking lua5.1/lua.h usability... no checking lua5.1/lua.h presence... no checking for lua5.1/lua.h... no configure: error: lua not found wget-lua not successfully built.
04:24 πŸ”— techapj I installed lua on Ubuntu 14.04
04:24 πŸ”— techapj ~/testflight-grab# lua -v Lua 5.2.3 Copyright (C) 1994-2013 Lua.org, PUC-Rio root@wget-gearbox:~/testflight-grab#
04:25 πŸ”— chfoo you need libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev bzip2 zlib1g-dev
04:26 πŸ”— techapj ok, just installed these dependencies, trying building again
04:31 πŸ”— techapj now getting error:
04:31 πŸ”— techapj POD document had syntax errors at /usr/bin/pod2man line 71. make[2]: *** [wget.1] Error 255
04:33 πŸ”— chfoo the bit near the bottom the readme should fix that
04:34 πŸ”— chfoo wget-lua should already be built
04:34 πŸ”— techapj oh sorry, should have read the README :)
04:35 πŸ”— techapj thanks a lot for your help @chfoo! you are awesome ;)
04:37 πŸ”— Kenshin has quit IRC (Ping timeout: 246 seconds)
04:38 πŸ”— chfoo no problem
04:41 πŸ”— techapj @chfoo i cannot find wget file in /get-wget-lua.tmp/src
04:41 πŸ”— techapj i can see wget.h but not wget
04:41 πŸ”— Kenshin has joined #archiveteam
04:45 πŸ”— chfoo techapj: i guess you have a different error. could you see if you can make this change at https://github.com/ArchiveTeam/wget-lua/commit/5d7348c0d047331539ac38e64fdb53bb5e52aae4 and avoid trying to build the doc
04:46 πŸ”— xmc techapj: welcome to archiveteam! (btw, typical way to mention people on irc is like i just did here, not with an @-sign.)
04:46 πŸ”— xmc glad to hear someone else is tackling The Forums Problem :)
04:47 πŸ”— techapj thanks xmc, this is my first time chatting in irc :)
04:47 πŸ”— techapj xmc: vBulletin fourm is a nightmare to archive
04:47 πŸ”— xmc i guessed as much ... using @ to talk to people is a pretty strong indicator
04:47 πŸ”— xmc yes. yes it is.
04:48 πŸ”— xmc i am slowly working on a semi-secret project to archive all the forums.
04:48 πŸ”— techapj the vbulletin.lua script is the only hope i have to archive a fairly large forum
04:49 πŸ”— xmc care to say what forum it is?
04:49 πŸ”— techapj oldforums.gearboxsoftware.com
04:49 πŸ”— techapj they are now our client, moved to Discourse: http://forums.gearboxsoftware.com/
04:50 πŸ”— xmc ah that's fairly large then
04:50 πŸ”— xmc i have a 1/4-working vbulletin crawler, abandoned dev on that a few years ago because it became unmaintainable
04:51 πŸ”— xmc i was trying to make it resilient to variation across themes, which as it turns out is an AI-hard problem
04:51 πŸ”— techapj is vubulletin.lua script reliable for gearbox vB archive?
04:52 πŸ”— techapj to be honest, i have doubts
04:52 πŸ”— xmc beats me. it will get all the things, if it doesn't explode in the process
04:52 πŸ”— techapj i have spent more than 4 days struggling with pure wget, gave up at last :(
04:52 πŸ”— xmc i see that they use normal URLs
04:52 πŸ”— xmc yes, pure wget on a forum will explode quickly
04:53 πŸ”— xmc are you using wget or wpull?
04:53 πŸ”— techapj wget 1.16.2
04:53 πŸ”— xmc ah, wget.
04:54 πŸ”— xmc so if wget doesn't work for you, you may get better results with wpull.
04:54 πŸ”— xmc iirc wpull is more suited to be a crawler than wget, for a few reasons
04:54 πŸ”— xmc such as it keeps its queue of urls in (iirc) a sqlite file, rather than ram
04:54 πŸ”— techapj ah, https://github.com/chfoo/wpull
04:55 πŸ”— techapj made by chfoo :)
04:55 πŸ”— xmc :)
04:56 πŸ”— techapj so would you recommend that instead of using vbulletin.lua?
04:57 πŸ”— xmc vbulletin.lua has treated me well in the past
04:57 πŸ”— techapj the main problem with gearbox forum is:
04:57 πŸ”— techapj Threads: 348,575, Posts: 4,879,823, Members: 519,365
04:57 πŸ”— xmc if the job turns out to be too big for wget, though, you might want to look into getting vbulletin.lua to run with wpull instead of wget
04:57 πŸ”— xmc yes
04:57 πŸ”— xmc it is a not small forum
04:58 πŸ”— techapj i tried doing: `wget --mirror --adjust-extension --no-clobber --convert-links --random-wait --no-parent --page-requisites robots=off -U mozilla http://oldforums.gearboxsoftware.com/`
04:58 πŸ”— techapj but it downloaded over 20GB of data, ran infinite loop
04:59 πŸ”— xmc using a crawler directly on a blog site, without a script to tell it what to ignore, will result in an unsatisfactory output
04:59 πŸ”— xmc please commit that sentence to memory
04:59 πŸ”— xmc er
04:59 πŸ”— xmc s/blog site/forum/
04:59 πŸ”— xmc sorry, it's 21:00 on saturday night and i just had my first coffee
05:00 πŸ”— xmc to crawl a forum satisfactorily, you need one of two things:
05:00 πŸ”— xmc 1}
05:00 πŸ”— techapj xmc: no problem, thanks for your help and recommendations
05:00 πŸ”— xmc 1} a large set of regular expressions to ignore urls
05:00 πŸ”— techapj i really apprecite it :)
05:00 πŸ”— xmc 2} a script to drive the crawler
05:00 πŸ”— xmc sure thing
05:00 πŸ”— techapj i really appreciate it :)
05:01 πŸ”— xmc my pleasure
05:01 πŸ”— techapj and still learning how to chat in irc ;)
05:01 πŸ”— chfoo oh, there's this too https://github.com/ludios/grab-site if you want to ignore urls on the fly
05:02 πŸ”— techapj chfoo: thanks! will look into it
05:07 πŸ”— techapj chfoo: I am a wget beginner and never archived a site/forum/blog before. What would you recommend me for archiving a large vB forum? Until now i have figured out options like vbulletin.lua, wpull. but what would you recommend?
05:11 πŸ”— chfoo techapj: i guess you could take a look at running heritrix (i never used it before) or perhaps try setting up archiveteam's archivebot (given that you change the user-agent and other hardcoded settings)
05:16 πŸ”— techapj chfoo: ok, will look into it. thanks for your help :)
05:34 πŸ”— fenn since you work with the company hosting the forum, can't you just ask for a database dump?
05:36 πŸ”— techapj fenn: we have database dump, but we want to convert the vB forum to read only static HTML archive
05:37 πŸ”— techapj we have already imported the old data in new Discourse forum
05:37 πŸ”— techapj but gearbox wants to host a read only static HTML of old forum
05:52 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
05:54 πŸ”— dashcloud has joined #archiveteam
05:57 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
06:05 πŸ”— dashcloud has joined #archiveteam
06:11 πŸ”— xmc i'd like to move the warrior tracker machine (shilling) to a different hosting provider before the end of march
06:12 πŸ”— xmc headsup, chfoo Smiley yipdw GLaDOS arkiver underscor
06:15 πŸ”— yipdw cool np
06:15 πŸ”— yipdw FWIW, DigitalOcean seems to do ok
06:15 πŸ”— xmc oh, i have a provider in mind
06:15 πŸ”— yipdw ah
06:16 πŸ”— xmc i'm a part owner in http://vpssd.com/ so it'll be $40 a month of actual savings for me
06:16 πŸ”— yipdw is there an ArchiveTeam discount
06:17 πŸ”— xmc lol
06:17 πŸ”— xmc it's a money-losing business venture already, why would we give out discounts
06:18 πŸ”— Ctrl-S how is it losing money?
06:18 πŸ”— WubTheCap DigitalOcean is ready to kick you out on first abuse notices, even if they're not valid.
06:19 πŸ”— WubTheCap Also, >clown >no OpenBSD ISOs
06:19 πŸ”— xmc Ctrl-S: we don't have enough paying customers to pay all of the colo bills.
06:19 πŸ”— xmc basically.
06:20 πŸ”— WubTheCap I'm getting a Portlane colo next week.
06:21 πŸ”— WubTheCap xmc: SVColo is quite an unusual data center choice.
06:21 πŸ”— WubTheCap For what the webserver is hosted on, at least
06:21 πŸ”— xmc -> #archiveteam-bs
06:26 πŸ”— xmc anyway, people who have shit on that box ( chfoo Smiley yipdw GLaDOS arkiver underscor ), ping me if i need to do something other than rsync it over
06:27 πŸ”— yipdw xmc: it should be fine, just let me or chfoo know before you shut it down so we can get redis to properly shut down and write
06:28 πŸ”— xmc def. i think i'll do it next weekend, unless we're in the middle of a firedrill.
06:28 πŸ”— yipdw ok
06:28 πŸ”— xmc turn off all the services, point the dns to the new box, rsync everything over, turn on services on new box ... ?
06:29 πŸ”— yipdw provided it's an rsync from / I don't see a reason why it wouldn't work
06:29 πŸ”— * xmc nods
06:29 πŸ”— xmc sounds good
06:30 πŸ”— xmc /home/tinytown has about 40G
06:30 πŸ”— xmc oof
06:30 πŸ”— xmc :P
06:31 πŸ”— xmc the box is previous version of debian, how much of a little hell would it be to go from deb 6 to deb 7 on the new box?
06:31 πŸ”— xmc and not a full / rsync
06:32 πŸ”— WubTheCap Depends on your partitioning and the RNG
06:32 πŸ”— xmc ?
06:33 πŸ”— xmc i'm also a fan of occasionally redeploying machines from scratch, to head off bitrot
06:33 πŸ”— yipdw we could just see what happens
06:33 πŸ”— yipdw unless you're planning to turn off the old tracker host immediately
06:33 πŸ”— xmc that's the spirit!
06:34 πŸ”— yipdw it's probably also the only sane approach :P
06:34 πŸ”— yipdw everything else gets lost in HN-style what-if fappage
06:34 πŸ”— xmc nah I can keep it up until the end of the month
06:35 πŸ”— xmc http://xrtc.net/f/pixen/we-stop-bit-rot.png
06:35 πŸ”— yipdw i should eventually do the same for archivebot's control host
06:36 πŸ”— yipdw it's like on Ubuntu pwned.04
06:36 πŸ”— yipdw LTS
06:36 πŸ”— xmc hahaha
06:36 πŸ”— xmc long term suckage
06:40 πŸ”— Sk1d has quit IRC (Ping timeout: 265 seconds)
06:48 πŸ”— edward_ has joined #archiveteam
07:22 πŸ”— signius has quit IRC (Ping timeout: 306 seconds)
07:34 πŸ”— signius has joined #archiveteam
07:43 πŸ”— X-Scale has joined #archiveteam
07:43 πŸ”— primus104 has joined #archiveteam
07:49 πŸ”— mistym has joined #archiveteam
08:38 πŸ”— nertzy has joined #archiveteam
08:39 πŸ”— dashcloud has quit IRC (Read error: Connection reset by peer)
08:40 πŸ”— nertzy2 has quit IRC (Read error: Operation timed out)
08:40 πŸ”— codinghor has joined #archiveteam
08:40 πŸ”— codinghor WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
08:42 πŸ”— codinghor this is a very unfortunate substring length for my handle
08:43 πŸ”— dashcloud has joined #archiveteam
08:44 πŸ”— DFJustin yahoosucks
08:45 πŸ”— codinghor that worked thanks
08:47 πŸ”— nertzy has quit IRC (Read error: Operation timed out)
08:49 πŸ”— edward_ has quit IRC (Ping timeout: 512 seconds)
09:05 πŸ”— lag has joined #archiveteam
09:19 πŸ”— xmc codinghor: yeah. you might want to trim the other side: inghorror
09:20 πŸ”— xmc though i'm fond of dinghorro
09:32 πŸ”— mistym has quit IRC (Remote host closed the connection)
09:53 πŸ”— primus104 has quit IRC (Leaving.)
10:08 πŸ”— codinghor in the future, people will be able to use handles of MORE than 10 characters! Madness.
10:13 πŸ”— xmc sheeeit, we're still stuck at 9 over here
10:25 πŸ”— Sanqui cdnghrrr, and you get one spare
10:25 πŸ”— Smiley xmc: kill all my stuff
10:26 πŸ”— xmc nuke the account and everything in it?
10:26 πŸ”— Smiley yup
10:26 πŸ”— xmc ok
10:26 πŸ”— Sanqui what
10:26 πŸ”— xmc there are no files in your ~ anyway
10:26 πŸ”— Smiley nod
10:26 πŸ”— Sanqui we don't nuke things over here
10:27 πŸ”— Smiley not sure i even used that one?
10:27 πŸ”— xmc ok fine i will comment out the line in /etc/passwd and carefully move the ~ to somewhere in /var/backup
10:27 πŸ”— Sanqui excellent
10:27 πŸ”— Smiley lol k
10:28 πŸ”— Sanqui "it's empty" is also data
10:28 πŸ”— xmc just kidding
10:28 πŸ”— xmc deluser is fine
10:28 πŸ”— xmc this conversation records that it is empty
10:29 πŸ”— xmc oh hey there are hidden history files
10:29 πŸ”— xmc how the hell did you view redtube on this box
10:29 πŸ”— Sanqui lmao
10:29 πŸ”— xmc and why is it in your redis_cli history
10:30 πŸ”— xmc actually, switch the "how" and "why" in those two sentences
10:31 πŸ”— Sanqui [silence]
10:45 πŸ”— nertzy has joined #archiveteam
11:06 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:32 πŸ”— schbirid has joined #archiveteam
11:42 πŸ”— primus104 has joined #archiveteam
11:58 πŸ”— Sellyme_ has quit IRC (Ping timeout: 265 seconds)
11:58 πŸ”— Sellyme has joined #archiveteam
12:03 πŸ”— Ymgve has joined #archiveteam
12:26 πŸ”— primus104 has quit IRC (Read error: Connection reset by peer)
12:31 πŸ”— primus104 has joined #archiveteam
12:41 πŸ”— codinghor has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client)
12:44 πŸ”— dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
12:45 πŸ”— dashcloud has joined #archiveteam
12:47 πŸ”— RuairiCOL has joined #archiveteam
12:48 πŸ”— RuairiCOL Hello all - a site I'm a fan of is at risk because its file download server has gone offline - I have a copy of the entire folder structure up until November 2014 and I'm uploading it to my own server for the moment, but I'm wondering if archiveteam have somewhere I can store it in addition just in case
12:48 πŸ”— RuairiCOL The site is hybridized.org and contains a load of DJ sets
12:49 πŸ”— RuairiCOL I've got around 190GB to upload
12:49 πŸ”— RuairiCOL SketchCow: *boop* I'm @rc55 from Twitter and the demoscene, I also run a small demoparty called Sundown :)
12:50 πŸ”— Ctrl-S Internet archive will want it
12:50 πŸ”— schbirid definitely IA material imo
12:53 πŸ”— RuairiCOL cool - the upload tool is a bit of a blunt instrument, is there any way I can ftp up the stuff considering the size? There are also cue files that have useful metadata
12:56 πŸ”— Smiley errr
12:56 πŸ”— Smiley i wasn't viewing redtube :D
12:56 πŸ”— Smiley we may of looked at archiving it D:
12:58 πŸ”— dashcloud has quit IRC (Read error: Connection reset by peer)
12:59 πŸ”— dashcloud has joined #archiveteam
13:02 πŸ”— Ctrl-S wht not just zip it all?
13:02 πŸ”— Ctrl-S site.tar.gs or site.zip
13:11 πŸ”— RuairiCOL I'll work on that - I'll get this first upload done then upload it to archive.org next week (192gb at 512k p/sec...)
13:17 πŸ”— Ctrl-S ask the others before you bother doing anything
13:17 πŸ”— Ctrl-S i'm not a good source of advice
13:27 πŸ”— Kazzy RuairiCOL: archive.org does have a way of uploading as a torrent, which sounds like it's easier for you http://archiveteam.org/index.php?title=Internet_Archive#Uploading_to_archive.org
13:53 πŸ”— schbirid dont zip things like media files
13:54 πŸ”— schbirid you can upload with curl, s3cmd or some python/perl tools
13:55 πŸ”— Smiley and the iauploader script
14:34 πŸ”— Sanqui Is there something other than the WaybackMachine I can check for removed/expired pastebins?
14:37 πŸ”— trs80 that sounds like a good project for urlteam
14:37 πŸ”— Ctrl-S pastebins?
14:37 πŸ”— Ctrl-S i've got code for pastebin stuff
14:37 πŸ”— Ctrl-S sort of
14:38 πŸ”— Ctrl-S recursive download stuff
14:38 πŸ”— Sanqui well, I'm looking for one specific paste
14:38 πŸ”— Ctrl-S there's also a AT project for new pastes
14:38 πŸ”— Ctrl-S why?
14:38 πŸ”— Sanqui from 2013
14:38 πŸ”— Sanqui well, the paste "has been removed" and the author doesn't have it any more
15:07 πŸ”— Emcy has quit IRC (Read error: Connection reset by peer)
15:07 πŸ”— Emcy has joined #archiveteam
15:18 πŸ”— ohhdemgir has quit IRC (Read error: Operation timed out)
15:20 πŸ”— ohhdemgir has joined #archiveteam
15:22 πŸ”— Emcy has quit IRC (Read error: Connection reset by peer)
15:53 πŸ”— Emcy has joined #archiveteam
16:00 πŸ”— chfoo http://blog.postach.io/post/brand-new-version-launched-billing-changes
16:23 πŸ”— dashcloud has quit IRC (Read error: Connection reset by peer)
16:24 πŸ”— dashcloud has joined #archiveteam
16:39 πŸ”— Emcy has quit IRC (Ping timeout: 512 seconds)
16:48 πŸ”— mietek has joined #archiveteam
16:51 πŸ”— mietek Does anyone happen to have an archived copy of ftp://ftp.cpsc.ucalgary.ca/pub/projects/charity/ ?
16:51 πŸ”— mietek The web site is still online, but all the source/code links are dead: http://pll.cpsc.ucalgary.ca/charity1/www/home.html
16:54 πŸ”— ats mietek: https://synrc.com/publications/cat/Functional%20Languages/Charity/ ?
16:55 πŸ”— mietek ats: wow, cool.
16:56 πŸ”— mietek ats as in the ATS language?
16:57 πŸ”— ats no, ats as in my name ;-)
16:57 πŸ”— mietek Thanks. Was that just a good Google search?
16:58 πŸ”— ats yup; I searched for charity-src.tar.gz...
16:58 πŸ”— mietek Thanks.
17:55 πŸ”— mistym has joined #archiveteam
17:57 πŸ”— dashcloud has quit IRC (Ping timeout: 370 seconds)
17:57 πŸ”— dashcloud has joined #archiveteam
18:02 πŸ”— Jonimus has quit IRC (Ping timeout: 260 seconds)
18:07 πŸ”— rolfb has joined #archiveteam
18:11 πŸ”— mietek Unfortunately the example programs seem not to have been mirrored anywhere..
18:12 πŸ”— mietek (http://pll.cpsc.ucalgary.ca/charity1/www/examples.html)
18:14 πŸ”— balrog mietek: have you tried emailing them?
18:14 πŸ”— mietek Not yet. Will do.
18:15 πŸ”— mietek First collecting what I can before I start bothering people who apparently gave up on the thing 15 years ago.
18:15 πŸ”— mietek Sorry if this isn’t the right channel for these questions.
18:17 πŸ”— rolfb has quit IRC (Leaving...)
18:20 πŸ”— mietek I was hoping there’s some secret FTP analog of archive.org which someone here might know.
18:22 πŸ”— balrog mietek: unfortunately there isn't. people have started for FTPs, but it's a bit late
18:44 πŸ”— the_fox is now known as TheOtherF
18:45 πŸ”— TheOtherF is now known as OtherFox
18:57 πŸ”— lag2 has joined #archiveteam
19:01 πŸ”— lag has quit IRC (Ping timeout: 512 seconds)
19:12 πŸ”— robink has quit IRC (Quit: No Ping reply in 180 seconds.)
19:13 πŸ”— Start is there any good way to scrape yahoo automatically?
19:13 πŸ”— robink has joined #archiveteam
19:13 πŸ”— Start i normally save search engine pages manually and extract urls with regex, but in the case of google business sitebuilder there's over 100 result pages per domain
19:19 πŸ”— winr4r Start: yes, use the bing API
19:35 πŸ”— xmc bling api
19:36 πŸ”— fenn Sanqui: https://archive.org/details/pastebinpastes goes back to 2013-10-30
20:18 πŸ”— yan has joined #archiveteam
20:50 πŸ”— bzc6p has joined #archiveteam
20:51 πŸ”— bzc6p Start: http://archiveteam.org/index.php?title=Site_exploration
20:57 πŸ”— mistym has quit IRC (Remote host closed the connection)
21:13 πŸ”— bzc6p has left
21:32 πŸ”— Emcy has joined #archiveteam
21:37 πŸ”— schbirid has quit IRC (Quit: Leaving)
21:49 πŸ”— Ravenloft has joined #archiveteam
22:31 πŸ”— Start where are the nokia memories archives? i can't find them anywhere
22:35 πŸ”— BlueMaxim has joined #archiveteam
22:53 πŸ”— serapeum has joined #archiveteam
22:56 πŸ”— Emcy_ has joined #archiveteam
22:56 πŸ”— Emcy has quit IRC (Ping timeout: 306 seconds)
22:58 πŸ”— lag2 has quit IRC (Read error: Operation timed out)
23:52 πŸ”— signius has quit IRC (Ping timeout: 306 seconds)

irclogger-viewer