#archiveteam-bs 2018-05-22,Tue

↑back Search

Time Nickname Message
00:13 🔗 godane ok then
00:13 🔗 godane sorry i screwed up the name
00:15 🔗 godane anyways that part the tape had very bad tracking
01:14 🔗 fie has quit IRC (Ping timeout: 252 seconds)
01:28 🔗 zhongfu has quit IRC (Ping timeout: 260 seconds)
01:52 🔗 ta9le has quit IRC (Quit: Connection closed for inactivity)
01:59 🔗 robink has quit IRC (Read error: Connection reset by peer)
02:06 🔗 robink has joined #archiveteam-bs
02:10 🔗 Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
02:12 🔗 Odd0002 has joined #archiveteam-bs
02:19 🔗 Odd0002_ has joined #archiveteam-bs
02:20 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
02:20 🔗 Odd0002_ is now known as Odd0002
02:45 🔗 Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
02:47 🔗 Odd0002 has joined #archiveteam-bs
03:30 🔗 alembic has joined #archiveteam-bs
03:33 🔗 qw3rty113 has joined #archiveteam-bs
03:38 🔗 qw3rty112 has quit IRC (Ping timeout: 600 seconds)
03:53 🔗 zino has quit IRC (Read error: Operation timed out)
04:43 🔗 godane so i'm getting my box of tapes i have bought tomorrow
04:46 🔗 robink has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
05:03 🔗 robink has joined #archiveteam-bs
05:50 🔗 alembic has quit IRC (Quit: Connection closed for inactivity)
06:01 🔗 zhongfu has joined #archiveteam-bs
06:02 🔗 DragonMon has joined #archiveteam-bs
06:18 🔗 bsmith093 has quit IRC (Read error: Operation timed out)
06:23 🔗 bsmith093 has joined #archiveteam-bs
07:25 🔗 DragonMon has quit IRC (Quit: Leaving)
07:30 🔗 schbirid has joined #archiveteam-bs
07:34 🔗 DragonMon has joined #archiveteam-bs
07:41 🔗 DragonMon has quit IRC (Quit: Leaving)
07:41 🔗 DragonMon has joined #archiveteam-bs
07:45 🔗 ta9le has joined #archiveteam-bs
07:50 🔗 DragonMon has quit IRC (Quit: Leaving)
08:23 🔗 DragonMon has joined #archiveteam-bs
08:56 🔗 davidar has joined #archiveteam-bs
09:57 🔗 REiN^ has joined #archiveteam-bs
10:12 🔗 SmileyG_ https://dnshistory.org/historical-dns-records/a/screencast-o-matic.com. -- check out the message to us, hahaha
10:26 🔗 BlueMax >this as abuse
10:26 🔗 BlueMax the only thing getting abused here is the English language
10:32 🔗 eientei95 >They are quite open about their attitude of ignoring robots.txt files
10:32 🔗 eientei95 >make no attempt to block any archiver in their robots.txt
10:34 🔗 JAA As far as I know, there was an AT project to archive DNSHistory in its entirety a while ago (2016?). They did block us then.
10:34 🔗 JAA They also activated CloudFlare's "I'm under attack" mode, I believe.
10:35 🔗 JAA I wasn't part of that effort though, so I might be wrong. There was definitely a project though, and it failed hard because the site operator did everything they could to stop it.
10:39 🔗 JAA SmileyG_, BlueMax, eientei95: https://archiveteam.org/index.php?title=DNS_History#Archiving
10:42 🔗 SmileyG_ ew
10:42 🔗 SmileyG_ I wonder if we can do some kinda manual effort D:
10:42 🔗 BlueMax they had a billion pages on that site? jesus
10:42 🔗 JAA They still do, I think.
10:43 🔗 JAA They announced their shutdown two years ago, but they're still online, so...
10:43 🔗 SmileyG_ they a pita.
10:43 🔗 SmileyG_ is now known as SmileyG
10:45 🔗 SmileyG How often does cloudflare make you re-auth a browser?
10:45 🔗 svchfoo1 sets mode: +o SmileyG
10:45 🔗 SmileyG i.e. could we run something 'in browser' that does what archive.org does, do the 'auth' then let it run for a bit, reauth, etc.
10:46 🔗 JAA Depends on how it's configured. I've grabbed sites where I just had to pass the challenge once and could then fire away.
10:46 🔗 Despatche people taking it upon themselves to erase history is exactly why archive team exists
10:47 🔗 JAA On other sites, I had to resolve the challenge every few requests to every few hours.
10:47 🔗 JAA Fortunately, it's fairly easy to bypass the "just checking your browser" thing.
10:47 🔗 JAA When they force a reCAPTCHA, it becomes essentially impossible.
10:50 🔗 SmileyG yah
10:50 🔗 SmileyG few hours isn't too bad.
10:50 🔗 SmileyG I can leave it running and check on it occasionally.
10:51 🔗 odemg has joined #archiveteam-bs
10:51 🔗 SmileyG hell, we could prob even make a warrior project which just displays the capture page when it happens? :/
10:51 🔗 JAA The "just checking your browser" page can be circumvented automatically.
10:52 🔗 SmileyG well, even better.
10:52 🔗 BlueMax has quit IRC (Leaving)
10:52 🔗 JAA The standard approach uses NodeJS to evaluate arbitrary JS (WCGW?). joepie91 implemented a parser instead which evaluates the challenge directly, and I ported that to Python a while ago.
10:53 🔗 JAA "evaluates the challenge directly" meaning that it doesn't execute anything; it just calculates the correct response directly. The format of the challenge code hasn't changed in years.
10:54 🔗 JAA But if the site operator wants to stop us, they absolutely can do that. They just need to enable the captcha.
10:54 🔗 SmileyG yah
10:55 🔗 SmileyG but i'm saying, crawl until we get the captha, display the captcha, solve it, carry on...
10:55 🔗 JAA Yeah, that could be done I guess.
10:56 🔗 JAA We could centralise that as well, i.e. send the captcha image to somewhere, and any user can solve it.
10:57 🔗 JAA Just like those services where you pay some money per solved captcha.
10:57 🔗 JAA (Aka slavery)
11:08 🔗 eientei95 Nah, slaves never got paid
11:08 🔗 eientei95 This is more like... I got nothing. Slaves it is
11:11 🔗 eientei95 >If I eventually decide to move from this forum, my biggest concern is how to exactly preserve this old one. Tapatalk has an inactivity policy where if there has been no posts in 90 days, the forum can get purged which is a massive problem. I've also heard Tapatalk staff has also deleted posts that they don't agree with or believe violate their Code of Conduct which is another issue. And the cherry on the cake... I can't download a backup
11:11 🔗 eientei95 database of this forum at the moment despite it being requested for years.
11:18 🔗 ta9le Yeah, the guy whom you're quoting is my fellow admin of those forums
11:19 🔗 ta9le but yeah, #zetatheplank, cuz it exists
11:59 🔗 Stilett0- has joined #archiveteam-bs
12:19 🔗 Stilett0- has quit IRC ()
12:29 🔗 JAA I just saw that the FCC net neutrality comments are in the news again, and I wonder if there is any archive of these yet and, if not, if it's worthwhile grabbing them.
12:29 🔗 JAA I couldn't find anything quickly on IA.
12:34 🔗 JAA A few filings which got a lot of attention were archived in the Wayback Machine of course (e.g. "Obama"'s comment at https://www.fcc.gov/ecfs/filing/1051157755251 ), but I'm talking about the entire dataset.
12:39 🔗 eientei95 https://www.fcc.gov/ecfs/search/filings?q=%22The%20unprecedented%20regulatory%20power%20the%20Obama%20Administration%20imposed%22&sort=date_disseminated,DESC
12:40 🔗 eientei95 JAA:
12:40 🔗 eientei95 The filings are provided in JSON files, in batches of 10,000, compressed into three archive files. The public can access the data here:
12:40 🔗 eientei95 https://data.fcc.gov/comments/17-108/ECFS_17-108_1.zip
12:40 🔗 eientei95 https://data.fcc.gov/comments/17-108/ECFS_17-108_2.zip
12:40 🔗 eientei95 https://data.fcc.gov/comments/17-108/ECFS_17-108_3.zip
12:43 🔗 JAA eientei95: Those zips only contain the data up to half a year ago according to the notice on the website. And yeah, grabbing them should be easy enough.
12:44 🔗 eientei95 Right
12:46 🔗 JAA Also, filings can have attached documents. I'm not sure if those are included in the zips. (The download's pretty slow, so I can't check right now.)
12:47 🔗 JAA Although most filings will probably not have attachments.
12:51 🔗 godane so one of the tapes i got has some hbo promos
12:51 🔗 godane there is HBO News segment about space jam
12:52 🔗 eientei95 ...
12:52 🔗 eientei95 JAA: These include the person's contact email
12:53 🔗 eientei95 and physical address
12:54 🔗 JAA eientei95: Yeah, that's true. I'm not sure if this is a problem though since the data is already publicly available (and people agree to have it published when submitting a comment, I believe).
12:56 🔗 eientei95 Their email address isn't normally accessible via the web interface
13:09 🔗 eientei95 JAA:
13:09 🔗 eientei95 'documents': [{'description': 'Opposition comment', 'filename': 'Internet Service Providers comment.docx', 'src': 'https://ecfsapi.fcc.gov/file/DOC-56f1752599000000-A.docx'}],
13:11 🔗 eientei95 Site checks the referrer
13:22 🔗 JAA eientei95: Yeah, but the actual documents aren't in the zip, right?
13:22 🔗 eientei95 Nope
13:32 🔗 Jens has quit IRC (Remote host closed the connection)
13:32 🔗 Jens has joined #archiveteam-bs
13:35 🔗 godane so i got a tape with 6 hours worth of content
13:35 🔗 godane this AMC recording on it
13:36 🔗 godane maybe of amc airing of Warlock
13:36 🔗 godane it was recorded from 1996-12 or 1997-12
13:49 🔗 wp494 has quit IRC (Ping timeout: 492 seconds)
13:50 🔗 wp494 has joined #archiveteam-bs
13:50 🔗 svchfoo3 sets mode: +o wp494
14:36 🔗 robink has quit IRC (Ping timeout: 506 seconds)
15:01 🔗 antomati_ is now known as antomatic
15:25 🔗 ta9le Hot damn
16:06 🔗 zino has joined #archiveteam-bs
16:08 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
16:09 🔗 Mateon1 has joined #archiveteam-bs
17:12 🔗 Sk2d has joined #archiveteam-bs
17:13 🔗 robink has joined #archiveteam-bs
17:13 🔗 Sk1d has quit IRC (Read error: Operation timed out)
17:13 🔗 Sk2d is now known as Sk1d
18:02 🔗 jschwart has joined #archiveteam-bs
18:38 🔗 verifiedj has joined #archiveteam-bs
19:03 🔗 verifiedj has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
19:51 🔗 Stilett0- has joined #archiveteam-bs
20:45 🔗 schbirid has quit IRC (Quit: Leaving)
20:45 🔗 wacky has joined #archiveteam-bs
21:27 🔗 CoolCanuk has joined #archiveteam-bs
21:56 🔗 Mateon1 has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 dxrt_ has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Aoede has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 espes___ has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Rai-chan has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 svchfoo1 has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 omglolbah has quit IRC (se.hub efnet.portlane.se)
21:58 🔗 REiN^ has quit IRC (Read error: Operation timed out)
22:12 🔗 Mateon1 has joined #archiveteam-bs
22:12 🔗 dxrt_ has joined #archiveteam-bs
22:12 🔗 Gfy has joined #archiveteam-bs
22:12 🔗 Aoede has joined #archiveteam-bs
22:12 🔗 espes___ has joined #archiveteam-bs
22:12 🔗 Rai-chan has joined #archiveteam-bs
22:12 🔗 svchfoo1 has joined #archiveteam-bs
22:12 🔗 omglolbah has joined #archiveteam-bs
22:12 🔗 efnet.portlane.se sets mode: +oo dxrt_ svchfoo1
22:12 🔗 dxrt sets mode: +o dxrt_
22:12 🔗 godane so i'm capturing a SP tape of Michal juior
22:14 🔗 godane *Michael junior
22:15 🔗 godane i'm doing it at 6000k for video cause at 10000k the signal gives capture bitrate at about 4700k
23:03 🔗 Asparagir has joined #archiveteam-bs
23:03 🔗 svchfoo3 sets mode: +o Asparagir
23:19 🔗 Stilett0- is now known as Stiletto
23:29 🔗 Stiletto has quit IRC ()
23:30 🔗 Stilett0- has joined #archiveteam-bs
23:39 🔗 megaminxw has joined #archiveteam-bs
23:39 🔗 megaminxw hmm, efnets having a tantrum or something
23:39 🔗 megaminxw anyway
23:40 🔗 megaminxw im currently attempting to archive a website (kungfoomanchu.com) using wget, but even though i explicitly add --mirror, it only ever downloads the front page
23:40 🔗 megaminxw there are a bunch of pdfs on there that it doesnt even touch
23:40 🔗 JAA "efnets having a tantrum" What else is new? ;-)
23:41 🔗 JAA Look into --recursive and --page-requisites.
23:41 🔗 megaminxw i think it might be something to do with the site being all fancy and javascripty-single page-stupidness
23:41 🔗 JAA Ah yeah, JS is hell.
23:41 🔗 megaminxw yeah, ive put those in, but it just downloads the front page
23:41 🔗 megaminxw there are even links on the front page to a few pdfs, with no javascript there at all, and it just ignores them
23:42 🔗 megaminxw im mildly annoyed to put it politely
23:42 🔗 JAA Does it grab the images?
23:42 🔗 JAA E.g. http://www.kungfoomanchu.com/images/333.png
23:42 🔗 megaminxw it grabs some of them
23:43 🔗 JAA Yeah, that doesn't surprise me. Most of the content is loaded through JS.
23:43 🔗 JAA The site isn't usable at all with JS disabled.
23:44 🔗 megaminxw so how would one go about archiving this? tbh its just for my own collecting habits, i dont think its really in danger
23:44 🔗 JAA You could try warcprox and either manual recursion in a browser or brozzler. (I haven't used brozzler myself, so I can't tell you how well that works.)
23:44 🔗 megaminxw hmm okay
23:44 🔗 JAA "manual recursion" meaning "click through everything in tabs", of course.
23:44 🔗 JAA There is no good option for heavily scripted sites which load content from the server asynchronously, unfortunately.
23:45 🔗 robink has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
23:45 🔗 megaminxw thanks javascript~ really appreciate it
23:45 🔗 megaminxw bergh
23:57 🔗 robink has joined #archiveteam-bs

irclogger-viewer