#archiveteam-bs 2018-08-28,Tue

โ†‘back Search

Time Nickname Message
00:01 ๐Ÿ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. ๐Ÿ˜ด๐Ÿ˜ชZZZzzzโ€ฆ)
00:37 ๐Ÿ”— BlueMax has joined #archiveteam-bs
01:46 ๐Ÿ”— purplebot has quit IRC (Read error: Operation timed out)
01:46 ๐Ÿ”— PurpleSym has quit IRC (Read error: Operation timed out)
01:53 ๐Ÿ”— purplebot has joined #archiveteam-bs
01:54 ๐Ÿ”— PurpleSym has joined #archiveteam-bs
02:14 ๐Ÿ”— Raccoon has quit IRC (Remote host closed the connection)
02:14 ๐Ÿ”— Raccoon has joined #archiveteam-bs
02:19 ๐Ÿ”— odemg has quit IRC (Ping timeout: 260 seconds)
02:32 ๐Ÿ”— odemg has joined #archiveteam-bs
03:22 ๐Ÿ”— odemg has quit IRC (Ping timeout: 260 seconds)
03:34 ๐Ÿ”— odemg has joined #archiveteam-bs
04:12 ๐Ÿ”— Arctic has joined #archiveteam-bs
04:14 ๐Ÿ”— kiska [2018-08-28 04:13:08] <Arctic> We should probably archive http://hiddenpalace.org/ as it contains a lot of significant prototypes of games and Nintendo as of late has been on a high-profile crusade against ROMS.
04:15 ๐Ÿ”— kiska We likely won't get the ROMS but will get the pages relating to the ROMS
04:15 ๐Ÿ”— Flashfire put it in the archive bot because Mwuhahahahaha
04:15 ๐Ÿ”— Flashfire The roms are the important part
04:15 ๐Ÿ”— Flashfire here
04:25 ๐Ÿ”— Flashfire Kiska the roms are the important part here in my opinion
04:25 ๐Ÿ”— Flashfire however it would probably be best if someone grabbed them and uploaded them to the archive rather than through the wayback macgine
04:26 ๐Ÿ”— kiska ROMs are important if they can't be found elsewhere
04:26 ๐Ÿ”— Flashfire These ones are
04:27 ๐Ÿ”— kiska Alright let's see how archivebot handles the downloads
04:27 ๐Ÿ”— Flashfire they are literally the only places you can find 90% of this stuff
04:27 ๐Ÿ”— kiska Should be easy since the download links are in the plain
04:28 ๐Ÿ”— kiska I don't see any JavaScript wrapping it
04:47 ๐Ÿ”— Arctic Thanks.
04:51 ๐Ÿ”— Arctic kiska: So we're using ArchiveBot to archive the site and ROMs?
05:16 ๐Ÿ”— godane i'm seeing about making a hiddenpalace.org warc
05:16 ๐Ÿ”— godane this cause i'm going to try to extract the rom images urls from it
05:16 ๐Ÿ”— godane then we have a list that i can do a !ao < pastebin of it
05:22 ๐Ÿ”— Arctic Alright. Where is it going to be hosted?
05:25 ๐Ÿ”— Flashfire The internet archive if things all go right
05:28 ๐Ÿ”— Arctic Sounds good.
05:28 ๐Ÿ”— Arctic Wayback Machine?
05:28 ๐Ÿ”— chferfa has quit IRC ()
05:30 ๐Ÿ”— Flashfire Depends
06:06 ๐Ÿ”— kiska godane I would not do that since archivebot should get the roms as well as everything it has on the pages unless there is severe JavaScript obscurification
06:07 ๐Ÿ”— kiska And also do it after we finish I'm the archivebot job since I am going to assume it will tax their server
07:00 ๐Ÿ”— Arctic has quit IRC (Quit: Page closed)
07:56 ๐Ÿ”— Mateon1 has quit IRC (Ping timeout: 268 seconds)
07:56 ๐Ÿ”— Mateon1 has joined #archiveteam-bs
09:24 ๐Ÿ”— purplebot has quit IRC (Remote host closed the connection)
09:24 ๐Ÿ”— PurpleSym has quit IRC (Quit: *)
09:25 ๐Ÿ”— PurpleSym has joined #archiveteam-bs
09:35 ๐Ÿ”— caff has quit IRC (Read error: Connection reset by peer)
10:41 ๐Ÿ”— bitBaron has joined #archiveteam-bs
10:49 ๐Ÿ”— bitBaron has quit IRC (My computer has gone to sleep. ๐Ÿ˜ด๐Ÿ˜ชZZZzzzโ€ฆ)
10:50 ๐Ÿ”— odemg has quit IRC (Ping timeout: 260 seconds)
11:01 ๐Ÿ”— faolingfa What is the general experience with server load caused by archiving? I am used to thinking of web crawling as a "drop in the ocean" kind of load but I hear/see a fair bit of concern about server load and rate limiting so I wonder if my impression is inaccurate.
11:02 ๐Ÿ”— odemg has joined #archiveteam-bs
11:02 ๐Ÿ”— BlueMaxim has joined #archiveteam-bs
11:03 ๐Ÿ”— kiska We are gonna be taxing their servers with >2 connections per 200 ms. Which might strain their connection
11:04 ๐Ÿ”— kiska For a warrior project, we might be hitting the server with >1000 connections so if their pipeline is not big enough then, its gonna over their pipeline. If processing power is insufficient then we are going to get error code 500s or some other code to tell us we are overloading their server
11:07 ๐Ÿ”— kiska overload*
11:07 ๐Ÿ”— BlueMax has quit IRC (Read error: Operation timed out)
11:26 ๐Ÿ”— zino has quit IRC (Remote host closed the connection)
11:27 ๐Ÿ”— zino has joined #archiveteam-bs
11:29 ๐Ÿ”— chr1sm Is that site RIP now? I can't connect from here, might be firewall though...
11:31 ๐Ÿ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:34 ๐Ÿ”— zino Tom's Hardware has been going through some internal struggles while switching from being a hardcore PC tech site to being a SEO driven click bait farm. They are now also apparently being sold.
11:34 ๐Ÿ”— zino I don't know how well their stuff is already covered in the archive, so it might be worth having a look at if someone else has some time.
11:36 ๐Ÿ”— zino They have 379 videos on their Youtube channel that are probably not covered by archives.
11:45 ๐Ÿ”— zino Oh great. http://www.tomshardware.com/ => "This URL has been excluded from the Wayback Machine."
11:45 ๐Ÿ”— zino The German version seems to be crawled every week though.
12:03 ๐Ÿ”— JAA faolingfa: Depends a lot on the website, obviously. It's often more a matter of the sites rate-limiting us even if they have the resources to serve us simply because the sysadmin configured it that way. Another concern in some cases is the amount of traffic caused; small sites might have low traffic caps.
12:04 ๐Ÿ”— JAA chr1sm: hiddenpalace.org works fine here.
12:30 ๐Ÿ”— kiska zino tom's youtube? I am going to chuck it into tubeup
13:06 ๐Ÿ”— bitBaron has joined #archiveteam-bs
13:11 ๐Ÿ”— kiska has quit IRC (Remote host closed the connection)
13:11 ๐Ÿ”— kiskabak2 has quit IRC (Remote host closed the connection)
13:11 ๐Ÿ”— Flashfire has quit IRC (Remote host closed the connection)
13:11 ๐Ÿ”— kiskaBak has quit IRC (Remote host closed the connection)
13:12 ๐Ÿ”— kiska has joined #archiveteam-bs
13:12 ๐Ÿ”— kiskabak2 has joined #archiveteam-bs
13:13 ๐Ÿ”— Flashfire has joined #archiveteam-bs
13:13 ๐Ÿ”— w0rmhole has joined #archiveteam-bs
13:13 ๐Ÿ”— kiskaBak has joined #archiveteam-bs
13:16 ๐Ÿ”— bitBaron has quit IRC (Ping timeout: 480 seconds)
14:04 ๐Ÿ”— Pixi has quit IRC (Quit: Pixi)
14:07 ๐Ÿ”— Pixi has joined #archiveteam-bs
14:32 ๐Ÿ”— zino kiska, https://www.youtube.com/user/TomsHardware
14:38 ๐Ÿ”— schbirid has joined #archiveteam-bs
14:39 ๐Ÿ”— kiska Ok I threw it into tubeup
14:50 ๐Ÿ”— zino \o/
14:55 ๐Ÿ”— wp494 has quit IRC (Read error: Operation timed out)
14:55 ๐Ÿ”— wp494 has joined #archiveteam-bs
15:08 ๐Ÿ”— Muad-Dib has joined #archiveteam-bs
15:09 ๐Ÿ”— svchfoo1 sets mode: +o Muad-Dib
15:42 ๐Ÿ”— faolingfa I am trying to familiarize myself with wpull. Is this up to date? https://wpull.readthedocs.io/en/master/install.html I ask because I get some fairly abrupt error right after installing it ("successfully") via pip: ImportError: cannot import name 'SSLCertificateError'
15:46 ๐Ÿ”— JAA faolingfa: You need Tornado 4.x, not 5.x. And also html5lib==0.9999999, not a higher version.
15:46 ๐Ÿ”— JAA You'll also want to use either wpull 1.2.3 or FalconK's (or my) fork. Version 2.0.1 is very unstable and hardly usable.
15:49 ๐Ÿ”— faolingfa Oh man
15:49 ๐Ÿ”— faolingfa Where's an exe file when you need one
15:56 ๐Ÿ”— offline_c has joined #archiveteam-bs
15:58 ๐Ÿ”— faolingfa https://launchpad.net/wpull/+download oh, here is an exe file! :)
15:58 ๐Ÿ”— JAA Oh yeah, that weird binary. Caused plenty of strange errors over at Newsgrabber before.
16:00 ๐Ÿ”— JAA How about using a proper OS? ;-)
16:03 ๐Ÿ”— offline_c J. Kenji Lรณpez-Alt a food blogger is closing his facebook page next week. See: https://m.facebook.com/story.php?story_fbid=1248348595307553&id=630532740422478
16:03 ๐Ÿ”— offline_c Is there a good way to back that up before it's lost?
16:05 ๐Ÿ”— JAA offline_c: There is no really "good" way, but I'm scraping it for posts now and will throw those into ArchiveBot later. Better than nothing at least. Won't grab all the comments etc. though.
16:09 ๐Ÿ”— offline_c JAA: thanks.
16:19 ๐Ÿ”— bitBaron has joined #archiveteam-bs
16:27 ๐Ÿ”— offline_c has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
16:54 ๐Ÿ”— zino has quit IRC (Remote host closed the connection)
16:54 ๐Ÿ”— zino has joined #archiveteam-bs
16:58 ๐Ÿ”— chferfa has joined #archiveteam-bs
17:56 ๐Ÿ”— caff has joined #archiveteam-bs
18:20 ๐Ÿ”— caff_ has joined #archiveteam-bs
18:27 ๐Ÿ”— caff has quit IRC (Read error: Operation timed out)
19:08 ๐Ÿ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. ๐Ÿ˜ด๐Ÿ˜ชZZZzzzโ€ฆ)
19:17 ๐Ÿ”— bitBaron has joined #archiveteam-bs
19:39 ๐Ÿ”— Mateon1 has quit IRC (Remote host closed the connection)
19:46 ๐Ÿ”— Raccoon has quit IRC (Remote host closed the connection)
19:46 ๐Ÿ”— Raccoon has joined #archiveteam-bs
19:48 ๐Ÿ”— Mateon1 has joined #archiveteam-bs
20:08 ๐Ÿ”— ndiddy has joined #archiveteam-bs
20:30 ๐Ÿ”— chferfa has quit IRC ()
20:45 ๐Ÿ”— ndiddy has quit IRC (Ping timeout: 252 seconds)
21:03 ๐Ÿ”— ppsym has joined #archiveteam-bs
21:05 ๐Ÿ”— Flashfire has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— w0rmhole has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— kiskaBak has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— PurpleSym has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— i0npulse has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— Frogging has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— hook54321 has quit IRC (Ping timeout: 252 seconds)
21:05 ๐Ÿ”— ppsym is now known as PurpleSym
21:06 ๐Ÿ”— Flashfire has joined #archiveteam-bs
21:06 ๐Ÿ”— medowar has joined #archiveteam-bs
21:06 ๐Ÿ”— i0npulse has joined #archiveteam-bs
21:07 ๐Ÿ”— w0rmhole has joined #archiveteam-bs
21:07 ๐Ÿ”— kiskaBak has joined #archiveteam-bs
21:07 ๐Ÿ”— hook54321 has joined #archiveteam-bs
21:08 ๐Ÿ”— Frogging has joined #archiveteam-bs
21:54 ๐Ÿ”— BlueMax has joined #archiveteam-bs
22:21 ๐Ÿ”— ndiddy has joined #archiveteam-bs
22:48 ๐Ÿ”— ndiddy has quit IRC (Ping timeout: 255 seconds)
22:53 ๐Ÿ”— schbirid has quit IRC (Remote host closed the connection)
23:28 ๐Ÿ”— Sk1d has joined #archiveteam-bs

irclogger-viewer