#archiveteam-bs 2017-09-14,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***pizzaiolo has quit IRC (Quit: pizzaiolo)
pizzaiolo has joined #archiveteam-bs
jrra has joined #archiveteam-bs
[00:16]
jrraoops [00:18]
***jrra has left part [00:18]
LagittajaI've been running the Hyper-V warrior for a while now Soulflare, working great [00:29]
***drumstick has quit IRC (Read error: Operation timed out) [00:40]
BlueMaxim has joined #archiveteam-bs [00:45]
......... (idle for 41mn)
Lagittaja has quit IRC (Quit: Leaving)
pizzaiolo has quit IRC (Quit: pizzaiolo)
[01:26]
..... (idle for 23mn)
drumstick has joined #archiveteam-bs [01:49]
odemg has quit IRC (Read error: Operation timed out) [01:59]
odemg has joined #archiveteam-bs [02:08]
.... (idle for 19mn)
Soulflareoh boy Xfire was archived, I had thought it was lost [02:27]
hook54321I thought we grabbed all of geocities awhile ago o_O [02:38]
astridby no means all!
we ran out of time, and we werent' very organized
remember, geocities was our very first project
[02:38]
ndiddyone of the stranger things about geocities was that users who were subscribed to geocities plus at the time of the shutdown had their sites up for a few more years [02:41]
SoulflareWas it maybe a separate cluster of servers? Seems odd [02:42]
ndiddyhell, the geocities clipart directory is still up
ex: http://www.geocities.com/clipart/pbi/backgrounds/Template_Pages/personalb.gif
[02:42]
SoulflareIs it a redirect unless exact path is found? [02:44]
ndiddyyeah, used to be a directory listing before yahoo screwed around with their webhosting division [02:45]
SoulflareThat's a pain :/ [02:45]
ndiddylook at that, some guy's geocities page is still up http://us.1.p.geocities.com/@bei-tech.com/investor/corp_gov_compensation.htm
the clear 10x10 g
if that yahoo pagebuilder used is also still up http://www.geocities.com/clipart/pbi/c.gif
[02:45]
***drumstick has quit IRC (Read error: Connection reset by peer) [02:52]
hook54321lol [02:56]
***drumstick has joined #archiveteam-bs [02:59]
Asparagir has joined #archiveteam-bs [03:05]
hook54321I don't think this is a geocities site... http://us.1.p.geocities.com/@/ [03:05]
ndiddyi'd guess they use yahoo business hosting [03:10]
***hook54321 sets mode: +o Asparagir [03:12]
xarphgeocities might be like pre-internet aol where the servers are still running and the sysadmins have turned over so many times there's literally no one who knows where these servers are or how they work [03:20]
SketchCowAhem.
There are a couple Geocities sites still up
The reason for this is that they paid to subscribe to a certain other Yahoo! web hosting service, so it keeps the sites up in both domains
[03:32]
hook54321I just discovered that you can run VirtualBox headless [03:43]
MrRadarI haven't seen it mentioned here, but SketchCow recently started a Patreon for a weekly podcast: https://www.patreon.com/textfiles [03:52]
.......... (idle for 49mn)
hook54321I'm not sure how likely this is, but we should ask a service like https://whois.domaintools.com/ if they would be willing to donate free access to us. They have reverse whois, historical whois, etc. [04:41]
***Sk1d has quit IRC (Ping timeout: 194 seconds) [04:49]
Sk1d has joined #archiveteam-bs [04:55]
............ (idle for 59mn)
Asparagir has quit IRC (Asparagir) [05:54]
.............. (idle for 1h7mn)
Soni has quit IRC (Ping timeout: 272 seconds)
Soni has joined #archiveteam-bs
[07:01]
kevinr has quit IRC (Read error: Operation timed out) [07:07]
hook54321I got a reply from the imgh.us guy
https://www.irccloud.com/pastebin/Qx4iJWyw/
[07:15]
....... (idle for 34mn)
***kevinr has joined #archiveteam-bs [07:49]
ranmaxarph: cue bash.org quote?
well, nearly
[07:54]
***trvz has joined #archiveteam-bs [07:54]
ranmahook54321: looks like if you knew it before nov 2016, you're thinking about something else
re: apollo
[07:57]
hook54321what do you mean?
oh
[07:57]
ranmathat's when it change from xanax to apollo
or there was/is overlap
[07:58]
hook54321I remember it being called xanax.
I don't think my account works anymore unfortunately though, I stopped using it for awhile.
[07:58]
ranmaah. i only migrated when good ole what went down :/ [07:58]
hook54321I think I may have joined right after they changed their name to apollo [07:59]
ranmai'm sure they'll reactivate you (unless you cheated or something or had terrible ratio)... big competition between them and RED [07:59]
hook54321Not sure how to ask them to reactivate me, also not sure what RED is. [08:00]
ranmaredacted [08:00]
hook54321ah. another torrent tracker? [08:00]
ranmamusic-focused [08:00]
hook54321Apollo is music focused as well, right?
JAA: See the response I got from the owner of imgh.us above.
[08:01]
ranmayep @ APL [08:03]
JAAhook54321: Nice. I see that the imgh.us domain itself is back as well. Let's get him to send you the list and throw it into ArchiveBot.
2M images shouldn't take too long anyway.
We can worry about the redirect to the archived version later. That shouldn't be hard, it's basically just 'redirect http://imgh.us/X to https://web.archive.org/web/date/http://imgh.us/X'. I think someone (jrwr?) set up something similar for Eroshare recently.
[08:09]
hook54321It's only 699,927 images [08:18]
I'm hoping I don't get caught by his spam filter again. [08:25]
.... (idle for 15mn)
***luckcolor has quit IRC (Read error: Operation timed out)
MrRadar2 has quit IRC (Read error: Operation timed out)
bluesoul has quit IRC (Read error: Operation timed out)
luckcolor has joined #archiveteam-bs
tsr has quit IRC (Read error: Operation timed out)
bluesoul has joined #archiveteam-bs
MrRadar2 has joined #archiveteam-bs
Honno has joined #archiveteam-bs
[08:40]
tsr has joined #archiveteam-bs [09:00]
Dimtree has quit IRC (Read error: Operation timed out) [09:06]
JAAAh right, confused it with the number of URLs at x.vu. [09:10]
......... (idle for 43mn)
***Dimtree has joined #archiveteam-bs [09:53]
............ (idle for 55mn)
Soni has quit IRC (Ping timeout: 272 seconds)
mls has quit IRC (Ping timeout: 250 seconds)
[10:48]
mls has joined #archiveteam-bs
BlueMaxim has quit IRC (Quit: Leaving)
[11:01]
.... (idle for 15mn)
drumstick has quit IRC (Ping timeout: 600 seconds)
drumstick has joined #archiveteam-bs
[11:18]
drumstick has quit IRC (Ping timeout: 255 seconds)
pizzaiolo has joined #archiveteam-bs
Soni has joined #archiveteam-bs
[11:27]
........ (idle for 35mn)
Lagittaja has joined #archiveteam-bs [12:05]
Dimtree has quit IRC (Peace) [12:13]
mls has quit IRC (Ping timeout: 250 seconds)
Dimtree_ has joined #archiveteam-bs
[12:20]
Lagittajaso, here's a question (I'm not looking for a guide, I can figure this stuff out), how "difficult" is it to use the manual scripts instead of the warrior if I use my own debian install. is it just a matter of installing the dependencies, starting the script and let it rip? does it need a lot of micromanaging? [12:23]
joepie91_Lagittaja: nope; it's pretty much run-and-forget other than needing to terminate/git-pull/restart for updates
also there's no automatic project switching when using manual scripts
so if you want to switch projects, you need to clone and run the new scripts yourself
[12:25]
Lagittajaalrighty then, pretty much what I expected. thank you joepie91_ [12:26]
***mls has joined #archiveteam-bs
Soulflare has quit IRC (Quit: http://drsclan.net)
Soulflare has joined #archiveteam-bs
Dimtree_ is now known as Dimtree
[12:37]
pizzaiolo has quit IRC (Quit: pizzaiolo)
pizzaiolo has joined #archiveteam-bs
refeed has joined #archiveteam-bs
[12:48]
..... (idle for 20mn)
hook54321I keep on getting temp banned from bitly
Doesn't last for very long though, it's kinda weird.
[13:12]
SoulflareI was not aware bitly would even do that [13:14]
Kalroththat's normal for me too
but i run a warrior with 6 connections
[13:14]
hook54321I am too, but I hardly ever have 6 jobs at once.
When I'm banned it says:
Forbidden
Uh oh, Bitly can't show you the page you are trying to access.
[13:14]
refeedhmmm, btw can warrior handle thing like cloudflare ddos protection? [13:22]
joepie91_not currently
patches very welcome :)
@ refeed
specifically, this would need to be ported to Python: https://gist.github.com/joepie91/c5949279cd52ce5cb646d7bd03c3ea36 (assuming their algo hasn't changed in the meantime)
[13:24]
refeedeuhm, that's javascript :/ , I still wonder how cloudflare javascript IUA challange works
btw there's a python library that can be used for that
https://github.com/Anorov/cloudflare-scrape
[13:27]
JAAYes, but it relies on NodeJS.
I've used it before, and I thought about implementing something in pure-Python instead.
Nice code joepie91_, I might play around with porting that at some point.
And no, I don't think anything has changed.
[13:28]
joepie91_refeed: hence 'porting to Pythoin' :)
Pythoin *
there are afaik currently no Python implementations of that
they all just shell out to phantomjs or equivalent which is a terrible solution
[13:30]
JAAYep, although it is probably more robust if CF decides to make subtle changes to the challenge. [13:31]
joepie91_the code I linked is an implementation that operates directly on the page source without needing an additional JS runtime, so could be ported directly to Python
JAA: sure, but at that point you can also just hard-fail
and fix the code
[13:31]
JAAIndeed [13:31]
joepie91_detecting cloudflare interstitials is much easier than breaking them :P [13:31]
refeedokay, I didn't look to it deeper [13:44]
............. (idle for 1h0mn)
***sep332 has joined #archiveteam-bs [14:44]
....... (idle for 34mn)
joepie91_refeed: btw, the cloudflare-scrape library you linked is particularly dangerous, because it executes arbitrary JS in Node (which has access to things like the filesystem, process API, etc.)
refeed: (their idea that it is now 'secure' because they're using Object.create(null) is wrong, btw)
[15:18]
refeedwew, okay, thanks for the heads-up, I just read the warning in its readme, I thought it was already secured, apparently not [15:27]
btw, currently I just use it to overcome with archive.is cloudflare challange, I'm pretty sure they (cloudflare and archive.is) will not doing evil things [15:33]
JAAarchive.is has a CF challenge? [15:33]
***TheLovina has quit IRC (Ping timeout: 370 seconds) [15:34]
refeedJAA: yes [15:34]
JAArefeed: Example? I've never seen one... [15:34]
refeedJAA: well, you can see it by yourself by running `$ curl -X GET "https://archive.is"`, or by visiting archive.is in your browser's incognito mode
refeed is taking a screenshot
https://imgur.com/a/g1Dsq
[15:37]
...... (idle for 27mn)
JAArefeed: That must be geo-limited or something then. I can access archive.is directly, including with curl or in a private window.
Interestingly, I get redirected to archive.fo in Firefox but not with curl.
[16:08]
refeedJAA: well, that's interesting
accessing archive.fo in my place also still receives a cloudflare challange, but now with no https :/
[16:14]
***BartoCH has joined #archiveteam-bs [16:30]
godaneso we may want to faraday cage the IA and its backups: http://worldif.economist.com/article/13526/electromagnetic-shock [16:31]
***refeed has quit IRC (Quit: Leaving) [16:34]
mlsHm, visiting archive.is in Pale Moon incognito doesn't trigger anything (with and without addons) [16:34]
***Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[16:42]
Aranje has joined #archiveteam-bs [16:49]
.... (idle for 19mn)
joepie91_there's a pretty widespread belief that the `vm` module in Node provides secure sandboxing, but it really really does not and never will [17:08]
***etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
etudier has joined #archiveteam-bs
TheLovina has joined #archiveteam-bs
[17:09]
...... (idle for 27mn)
kim__ has joined #archiveteam-bs [17:41]
kim__Does anyone know if I as a user on reddit.com can setup a filter, and filter out all the bots, maby whitelist some of the bots, but aargh, the bots piss me off, and THAT pisses me off, because thats the botcreators whole idea.
<-- new on reddit
[17:43]
rocodeUsing RES, yes.
kim__, https://redditenhancementsuite.com/
[17:46]
kim__thankyou - will look into this :D [17:48]
....... (idle for 30mn)
***Asparagir has joined #archiveteam-bs
svchfoo3 sets mode: +o Asparagir
svchfoo1 sets mode: +o Asparagir
[18:18]
second has joined #archiveteam-bs [18:24]
JAAWhat do you mean by "categorising data"? [18:25]
secondIs there a good tool to categorize data? [18:25]
rocodeLike, apply meta data? [18:26]
secondyes [18:26]
rocodeAutomatically or manually? [18:26]
secondand then join them all together
both
[18:26]
rocodeFor upload to IA, or just to the files in general? [18:26]
secondsuch that going to a page for a game would also mention related movies and other articles on the subject
Both, mainly local though
the archive is nice but its a bit lacking in joining data together
[18:26]
rocodeWell, IA wise, you can mass edit metadata at upload time or after via the IA CLi tool, https://internetarchive.readthedocs.io/en/latest/. As for locally, the only one I have ever really played around with is GNOME Tracker, https://en.wikipedia.org/wiki/Tracker_(search_software) [18:31]
JAAI like to keep it simple and use directories and symlinks (or hardlinks, depending on the circumstances). Every single software can handle that, unlike xattrs or separate metadata files. [18:39]
odemggodane, see news at MySpleen... fuck me!! https://portland.craigslist.org/mlt/emd/d/trip-down-analog-lane-27/6283808412.html
Number of tapes: approx. 23-24,000 (consists primarily of 6 & 8 hr Sony and TDK tapes).
[18:46]
godaneIA may need to email that guy to see if they can get a few of the tapes [18:52]
odemgAll of them. [18:52]
secondIs there anyway to download warcs from the IA? [19:00]
godanesince it has news on alot of tapes i think IA would take it [19:01]
JAAsecond: Depends. IA's own crawls and anything saved through web.archive.org/save/... isn't available. Our WARCs are. [19:02]
godaneodemg: if anyone is willing to send me tapes i will digitize them [19:03]
JAA(Those are just some examples. There are many more "types" of files on IA, e.g. those from the commercial (?) service Archive-It.) [19:03]
odemggodane, make sure the right people at ia know, I'd rather see them deal with this than myspleen [19:06]
secondHow do I check how a website was archived? [19:10]
***TheLovina has quit IRC (Read error: Operation timed out) [19:11]
JAAsecond: In the Wayback Machine, you mean? Click on "About this capture" in the top right. It doesn't help you with finding the actual IA item (where you might be able to download the WARCs) though. [19:11]
Selavire: craiglist VHS hoard. I'm a 40 minute drive away, but transporting would be a bit of an issue for me, and I would have to rent a storage unit [19:17]
AsparagirI know there is at least one other frequent ArchiveTeam member who lives near there too, but I don't want to out him here without his consent. But maybe he could help you with loading this stuff into a truck, if we can find someone to provide and drive the truck.
I am going to cross-post the link to the Internet Archive Slack channel; maybe someone there will have an idea. And yes, the IA owns a truck.
[19:27]
***Mateon1 has quit IRC (Remote host closed the connection)
Mateon1 has joined #archiveteam-bs
[19:30]
odemgCan we just stop and grasp... this guy has been recording tapes since 1986... and filled 24,000 tapes!! [19:32]
zinoHe deserves some kind of medal. [19:33]
odemgGiven his lowest conservative estimates that's 15 years of constant play video content [19:33]
zinoAre there digitization machines for VHS that can run the tape faster than real time?
A normal VHS player loses tracking as soon as you go faster than 1x.
[19:35]
MrRadarYeah, a better solution would probably just be to buy as many decks as possible and run them in parallel
Especially since they're relatively available and cheap right now
[19:37]
AsparagirThis seems kind of like the Marion Marguerite Stokes collection, although fewer tapes. This guy ran twelve VCR's in Oregon for twenty-seven years, while Stokes ran eight VCR's in Philadelphia for thirty-five years.
Stokes' tapes are now at the Internet Archive!
I just posted about all this in the Slack channel. I hope they come back to us soon.
SketchCow: this is something you need to see ^
[19:38]
zinoAsparagir, has IA documented what method they used for digitizing somewhere?
I have who moving boxes with VHS I should digitize and throw away sooner rather than later.
[19:39]
AsparagirDunno. I don't work there, I just drop by their main building a lot because I use their WiFi to upload big files to them. [19:40]
zinos/who/two/ [19:40]
Asparagir100 GB/s upload speeds. It is glorious.
Or was that MB/s.
Anyway, something insanely fast, and free.
[19:40]
zinoOh, too bad, I was going to ask what sorcery wifi you have over therte in the US. :) [19:41]
AsparagirHahaha [19:41]
MrRadarSpeaking of, does anyone have a good suggestion for a USB analog video capture device that plays nice with Linux?
(NTSC if that matters)
[19:41]
zinoMrRadar, not a recommendation, but these seem commonly used on Linux: https://linuxtv.org/wiki/index.php/Easycap
And they are cheap. I bought 6 diffrent variant of them this spring, but I haven't had time to test them yet.
[19:43]
MrRadarThanks, I'll check them out [19:44]
hook54321Where's the Craigslist link?
nvm
oh gosh. free of charge? has someone contacted him?
We need to make sure someone doesn't snatch it
[19:45]
AsparagirI know, right? [19:49]
***K4k has quit IRC (Quit: WeeChat 1.6) [19:49]
AsparagirBut we can't contact him unless we have a definite plan for pick-up, load-in, destination (presumably the IA), and digitization. [19:49]
***K4k has joined #archiveteam-bs [19:49]
AsparagirNot fair to take the tapes without having a solid backing from the IA, or another institution. [19:50]
joepie91_I smell an archive corps project coming up [19:50]
AsparagirMe too. But we don't have our 501(c)3 designation yet. So it would probably be Fearless Leader asking for help from the public directly. And the IA has to sign off on being the home of the data. [19:51]
hook54321How much man work is involved in archiving this? For the past few days I've been thinking about having the robotics team that's at my old high school collect old software CDs and do the whole archiving process on them, but this could work too.
Also, I'm close to this place
[19:51]
***K4k has quit IRC (Client Quit) [19:52]
hook54321Did someone in slack respond yet? [19:53]
***K4k has joined #archiveteam-bs [19:53]
hook54321I'm going to notify them and ask if they would be willing to help.
About how much space would this take up?
[19:55]
odemghook54321, Asparagir Problem being MySpleen has this post in their news second, hopefully the guy doesn't had off to them before ia/at get their ass into gear and convince him the tapes are best off at ia
second* (for 16 hours now)
[19:59]
hook54321huh?
I sent a message to Jason on twitter
[20:00]
Asparagirodemg: Is there a link to the MySpleen post? I'm not a member of that tracker... [20:03]
***K4k has quit IRC (Quit: WeeChat 1.6)
K4k has joined #archiveteam-bs
[20:03]
odemgAsparagir, https://i.imgur.com/nX4LEzD.png [20:04]
DFJustinseems like a job for archive corps [20:04]
odemgYup, he want's the whole lot take as a collection, not to have it parted out [20:05]
hook54321What's MySpleen? [20:05]
Asparagirodemg: Thanks. [20:05]
hook54321We need to contact this guy asap. [20:05]
odemghook54321, private tracker that has content much like this [20:05]
hook54321ah [20:05]
odemggodane, this reminds me I need to give the other tape guy a nudge... [20:06]
AsparagirWe can't contact him without there being buy-in from an archive or institution first. Unless someone here has a boatload of money and we can hire our own truck, hire a storage unit, and pay people's salaries to do digitization/uploading/metadata on this, which could take a year or more. [20:07]
hook54321We could get volunteers.
How many car loads would it take to transport them somewhere?
[20:08]
AsparagirFor the load-in part, volunteers would be great. That's not so hard. But we then need to store the tapes, and work on systematically getting them turned into data. That's harder, and expensive. Remember how SketchCow was stuck with paying the fees on the storage unit for all the catalogs he rescued with ArchiveCorps, hundreds of dollars each month. [20:09]
hook54321We could crowdfund it through something like GoFundMe. [20:13]
astridIA took delivery of a truckload of videotapes a while ago iirc [20:14]
hook54321Where were the tapes located though?
Jason responded to my message on Twitter.
He said "That is not a life well spent"
[20:15]
Selaviouch [20:19]
hook54321https://blog.archive.org/2013/11/22/a-dream-to-preserve-tv-news-on-the-road-to-realization-with-your-help/
"140,000 video cassettes"
I think there's a good chance they'll accept it
[20:23]
godanei think it was figured out to be 40k tapes not 140k [20:25]
AsparagirSketchCow responded in the Slack channel and said he would get into it more when he gets home (from wherever he is right now). But he did tag the head of the Internet Archive (Brewster Kahle) into the conversation, so eyes are on the project. [20:25]
godanecan anyone post the internet archive slack channel here? [20:26]
hook54321^
You have to have a archive.org email address, unless someone sends you an invite.
[20:27]
AsparagirIt's internetarchive.slack.com -- but you need someone who works at the Archive to invite you to it. I don't work there. [20:27]
godaneok [20:27]
AsparagirAnd there is only one channel open to non-employees like us. [20:27]
hook54321Apparently all the vhs tapes would take up 4800sqft [20:28]
AsparagirAsk SketchCow for invites, or maybe try the #internetarchive IRC channel, although people are rarely in there. [20:28]
hook54321If this person took the time to record all of these, I would think that they would be glad to see it go to somewhere like the Internet Archive, if they haven't already given it away. [20:33]
***Aranje has quit IRC (Ping timeout: 245 seconds) [20:33]
Selaviagreed [20:33]
hook54321I think if we contacted this guy and said that we're just trying to figure out the details, and we talk about the historical significance of them some, what IA plans to do with them, etc., then he might give it to us instead of someone else if they are still in his possession. [20:35]
AsparagirSketchCow is writing back to the Slack channel right now -- he just got off a 30 minute phone call with the guy!
quotes:
WELL I JUST TALKED TO DON FOR 30 MINUTES, THANKS BROOKE
There are several people who want this, to digitize
He wants to give the tapes away
He actually kept mentioning Marion stokes
So he was impressed we were those people
I'm mail him, he's going to make us all meet each other
...
So...that sounds promising!
[20:43]
Selavinice! [20:47]
***Aranje has joined #archiveteam-bs [20:49]
mundussweet :D [20:49]
hook54321Awesome! Are we supposed to meet each other in person, or? [20:54]
........ (idle for 39mn)
AsparagirWell, there has been talk of having an ArchiveTeamCo one of these days...
*ArchiveTeamCon
[21:33]
.... (idle for 16mn)
DFJustinhmm "It would probably be good to settle the main details by September 1st, so a possible January event could take place." http://www.archiveteam.org/index.php?title=Archive_Team_CONspiracy [21:49]
SketchCowYeah
I got a little busy
And people weren't stepping up
If it gets shifted forward, it's forward
Maybe Valentine's Day
Archive Team Conspiracy: The Valentine's Day Massacre
[21:52]
joepie91_that sounds exactly the right amount of sketchy [21:53]
DFJustinI look forward to explaining that to the friendly US customs gentleman [21:56]
***Stiletto has quit IRC (Ping timeout: 260 seconds) [21:57]
Selavilol [21:57]
***drumstick has joined #archiveteam-bs [21:58]
joepie91_DFJustin: probably further improved by "bring your old tape drives" day
it seems that the older the technology, the more indistinguishable it is from homemade explosives
[21:58]
astridthat sounds about right [22:07]
***dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[22:16]
AsparagirOld nitrate film, in particular! [22:20]
.... (idle for 17mn)
***Honno has quit IRC (Read error: Operation timed out) [22:37]
BartoCH has quit IRC (Quit: WeeChat 1.9)
dashcloud has quit IRC (Remote host closed the connection)
Mateon1 has quit IRC (Remote host closed the connection)
Mateon1 has joined #archiveteam-bs
dashcloud has joined #archiveteam-bs
[22:42]
astrideep ahahaha yes [23:06]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)