#archiveteam-bs 2017-08-24,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***Geekonoci has joined #archiveteam-bs [00:26]
Geekonoci has quit IRC (Quit: Page closed) [00:34]
hook54321How should I organize that? A table? or just a plain list? [00:46]
JAAWhat information do you have?
If it's really just a list of account names plus maybe some comments/notes, then I'd say plain list (alphabetically sorted, I guess). But if you have additional data for many of the accounts, a table might be better.
[00:48]
***BlueMaxim has joined #archiveteam-bs [00:58]
hook54321A lot of the time I discover it because people ask for it to be excluded, publicly on twitter... [01:02]
balrog, JAA [01:11]
***qw3rty116 has joined #archiveteam-bs
qw3rty115 has quit IRC (Read error: Operation timed out)
[01:14]
godanei'm at 1083k items [01:30]
arkiverhook54321: nice [01:31]
hook54321arkiver: I'm supposed to work with you on OpenNIC stuff [01:33]
arkiverI just read that yeah :P [01:33]
Fuslo/ [01:34]
***Aranje has quit IRC (Remote host closed the connection) [01:34]
Fuslhook54321: thanks for taking care of and getting AT involved in this [01:34]
hook54321np
I'm not sure what we're going to do about the .free tld. At this point it makes the most sense to just grab the .libre sites, since the .free sites were moved over there. However, I'm not sure if we should be worried about other webpages still using .free URLs.
I also have no idea what we'll do if ICANN decides to create a TLD that's already an OpenNIC tld.
again
[01:34]
FroggingThere's nothing to do about it. The DNS has no provisions for conflicting roots
(does it?)
[01:39]
hook54321In the case of .free, OpenNIC is basically just moving all of the .free domains over to .libre
Personally, I'm hoping that's what they're planning to do if this happens again.
[01:40]
Froggingwhat else can be done? [01:42]
hook54321This is their position on it: [01:44]
Somebody2a subdomain? .free.opennic? [01:44]
hook54321"What is OpenNIC's relationship with the other alternative roots and ICANN?
OpenNIC currently recognizes and peers all of the existing ICANN TLDs (.com, .uk, etc.). Therefore, if you configure your computer to resolve OpenNIC domains, you'll also be able to resolve all of the ICANN TLDs automatically.
OpenNIC has not yet evaluated nor does it hold a formal position on the current/future ICANN TLDs."
Somebody2: I don't think they would like that
[01:44]
Somebody2hook54321: who?
ICANN or OpenNIC, or someone else?
[01:45]
hook54321OpenNIC [01:45]
Somebody2why not? It would make it clear the source of the domains...? [01:46]
joepie91today in "wtf Google": https://twitter.com/joepie91/status/900534232296161284 [01:46]
Somebody2they could even do the same with IACANN domains: .com.icann [01:46]
hook54321That would break ssl certificates though [01:46]
Froggingjoepie91: http://i.imgur.com/By95Lva.jpg [01:47]
hook54321and it kinda defeats the purpose of what they're trying to do [01:47]
joepie91lol [01:47]
Somebody2hook54321: between converting domains previously at .free into domains at .libre vs converting them into domains at .free.opennic -- I'm not sure why .libre is better... can you clarify?
and how does adding .opennic or .icann at the end of domains defeat what they are trying to do?
[01:56]
hook54321because then it's a subdomain [01:59]
Froggingexample.com is a subdomain of .com [02:00]
hook54321.com is the TLD though [02:00]
FroggingDNS doesn't distinguish
http://com/ is valid
[02:00]
Somebody2DNS doesn't, but various protocols on top do in various subtle ways [02:01]
Froggingand as a side note, I am now immensely confused at the result of visiting that URL [02:01]
Somebody2like various browsers do different things depending on whether a DNS segment is a toplevel or not [02:01]
Froggingoh I see. someone has a sense of humour. ".XYZ is the next .COM. .XYZ is the #1 new domain in the world" [02:02]
Somebody2ok, so rather than .free.opennic, you could do performance horrible things by using a different delimiter, e.g. .free-opennic, or even .free;opennic
Frogging: oddly, neither dig nor curl are able to follow that redirect.
[02:03]
Froggingindeed, I think that one does not actually work, it was my browser adding .com to it automatically [02:08]
Somebody2Ah, this is because the browser automatically converts "http://com/" into a request for http://www.com.com/ [02:08]
Froggingtry this one http://dk/ [02:09]
Somebody2yes, that one works in dig and curl
301 redirecting to https://www.dk-hostmaster.dk/
[02:10]
Froggingyup [02:10]
..... (idle for 24mn)
I was thinking about that line ".XYZ is the next .COM" and it occurred to me that I've never in my life seen a legitimate .xyz website
so I went to one of the ones linked on the page
https://www.goinnovate.xyz/
this is... frightful.
"fog computing"
http://www.exponentials.xyz/posts/the-roles-of-cloud-computing-and-fog-computing-in-the-internet-of-things-revolut-6205388
[02:34]
.... (idle for 17mn)
***drumstick has quit IRC (Ping timeout: 268 seconds) [02:53]
drumstick has joined #archiveteam-bs [03:00]
qw3rty117 has joined #archiveteam-bs [03:14]
hook54321I'm surprised that IA actually complied with this guy's request: https://twitter.com/darthodius/status/658731881626783745 [03:18]
***qw3rty116 has quit IRC (Read error: Operation timed out) [03:20]
..... (idle for 21mn)
Fletcher has quit IRC (Remote host closed the connection) [03:41]
.... (idle for 15mn)
Fletcher has joined #archiveteam-bs [03:56]
.... (idle for 19mn)
Sk1d has quit IRC (Ping timeout: 250 seconds) [04:15]
hook54321public.resource.org refers to the Internet Archive building as "the Church of the Internet Archive" [04:16]
AsparagirWell, the bulding really was an old church once.
*building
It still has pews and all that.
[04:20]
***Sk1d has joined #archiveteam-bs [04:22]
.... (idle for 16mn)
hook54321Asparagir: yeah, i know.
I wonder if anyone has ever gone there in person and angerly demanded their site get removed from the wayback machine
[04:38]
.... (idle for 19mn)
***Asparagir has quit IRC (Asparagir) [04:58]
......... (idle for 40mn)
jrwrthis guy is my history hero https://www.youtube.com/watch?v=ZqUm1YXTxNc
Steve1989MREInfo
[05:38]
....... (idle for 34mn)
***Honno has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam-bs
[06:12]
soja92 has joined #archiveteam-bs [06:31]
........ (idle for 37mn)
kristian_ has joined #archiveteam-bs [07:08]
........ (idle for 36mn)
drumstick has quit IRC (Read error: Operation timed out)
drumstick has joined #archiveteam-bs
[07:44]
..... (idle for 23mn)
tuluu has quit IRC (Ping timeout: 245 seconds) [08:07]
tuluu has joined #archiveteam-bs [08:17]
........ (idle for 35mn)
Boppen has quit IRC (Ping timeout: 194 seconds)
drumstick has quit IRC (Ping timeout: 268 seconds)
kristian_ has quit IRC (Read error: Operation timed out)
[08:52]
etudier has joined #archiveteam-bs [09:00]
drumstick has joined #archiveteam-bs [09:05]
Boppen has joined #archiveteam-bs [09:15]
...... (idle for 25mn)
dashcloud has quit IRC (Read error: Connection reset by peer)
dashcloud has joined #archiveteam-bs
[09:40]
................... (idle for 1h31mn)
BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:11]
.... (idle for 15mn)
drumstick has quit IRC (Read error: Operation timed out) [11:26]
.......... (idle for 49mn)
brayden has quit IRC (Read error: Connection reset by peer)
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden
[12:15]
....... (idle for 32mn)
kurt has quit IRC (Read error: Operation timed out) [12:49]
............ (idle for 55mn)
SketchCowThe Archive has had to deal with a lot of crazy walking right in and demanding things, yes.
Two levels: The one you think of, someone coming in and demanding something related to content.
The other: Since it looks like a church, church-related things, like homeless or people asking for sanctuary, etc.
There are between 3-5 people who sleep at the Archive at night, since it looks like a church
There was one who was across the street for years, Thomas. He died last October, and even the night he died, we'd brought over extra food from the celebration going on, which he had.
http://richmondsfblog.com/2016/10/27/thomas-resident-homeless-man-at-funston-clement-passed-away-wednesday-night/
[13:44]
***BartoCH has quit IRC (Quit: WeeChat 1.9) [13:54]
vitzli has joined #archiveteam-bs [14:08]
TheLovina has quit IRC (Quit: Leaving) [14:21]
closure_til [14:30]
***Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[14:31]
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #archiveteam-bs
[14:36]
JAAhook54321: That's for pages which are explicitly blocked due to a direct request to IA, I assume? [14:36]
hook54321I'm not exactly sure yet.
Whatever we make it out to be I guess
Maybe both, idk
[14:37]
JAAI just think that a list of pages blocked by robots.txt is probably *massive*. [14:39]
hook54321yeah. on the other hand, it could be used for us to have some sort of database of what IA might not be crawling.
*could be useful\
[14:40]
.......................... (idle for 2h8mn)
***vitzli has quit IRC (Quit: Leaving) [16:48]
........ (idle for 37mn)
godaneanother page is missing for 01-01-2014
*2014-01-01
anyways is page 24
[17:25]
***Asparagir has joined #archiveteam-bs [17:35]
....... (idle for 30mn)
namespace has joined #archiveteam-bs [18:05]
namespaceSo I'm trying to OCR/type up/something this PDF so it can be put on a public website as an actually readable/searchable/etc historical archive: https://archive.org/details/8BBSArchiveP1V1
(Also to generate more interest in it through sex appeal, because it's actually a fairy important bit of phreaker history.)
And it is just the nastiest PDF out there (blurry monospace font, holepunched so that sometimes entire bits of text are missing, scuff marks, etc).
The automatic archive.org OCR is actually better than what I could get using OCRFeeder: https://archive.org/stream/8BBSArchiveP1V1/8BBS_Archive_P1V1_djvu.txt
Is there any way to do better, or?
[18:06]
***schbirid has joined #archiveteam-bs [18:08]
..... (idle for 21mn)
astridwow that is nasty yeah
i have a very flakey ocr program that works well on single-font fixed-width material
https://github.com/chronomex/ess-ocr if you want to poke at it
all the magic bits are hardcoded, so youd have to hardcode new constants to describe the pages you want to ocr
and that sort of thing
[18:29]
namespaceYeah here's the strategies I've thought up so far:
- Retype the whole thing (would prefer not to, takes TON of time, solitary)
[18:34]
astridi mean, this is the sort of thing that i wrote my ocr program for
this will ocr quite well
[18:35]
namespace- Highlight all the actual posts and then extract them as images, put up on a wiki and let other people help.
- OCR it, which yeah I'll try that thing you just posted thanks.
[18:35]
astridit's extremely fragile but this is fixed width enough to work super well
i am planning to do a total rewrite of the tool to make it useful, but for now you will be able to get it to work
watch out, tool is designed to work with pages that have a black frame around them, so you might need to remove some code about that
(it used the black frame for fiducial registration)
[18:35]
namespaceEheheh. [18:37]
astridcrop_to_rect is the function to comment out calls to [18:37]
atrocitywhere's the PDF you're trying to OCR? [18:37]
astridhttps://archive.org/stream/8BBSArchiveP1V1/8BBS_Archive_P1V1#page/n3/mode/1up [18:37]
atrocityi have pdf ocr software at work, i'll try running it on it, lol
downloading at like 300KiB/s, ugh
[18:39]
namespacewow lol
astrid: So stupid question, how do I compile this?
There's two files, no makefile, do I just compile them both separately and then execute one?
[18:41]
astridthey both have a comment at the beginning saying the magic compiler spells to cast
deskew comes from the leptonica library
[18:46]
namespaceAh, k.
Thanks.
[18:46]
astridso you might need to set that up
also you need to feed it 'pgm' images
[18:46]
namespaceAny dependencies? ^^;; [18:47]
astridess-ocr needs you to create a directory 'training' or it'll core
yeah libnetpnm among others
as i said
very rough :)
[18:47]
namespaceYeah I'm smelling a goose chase, thanks anyway. :p [18:49]
astridonce you get it running, call it over one of your images [18:49]
namespaceYes see that first bit.
Is the bit that is very unlikely to happen.
[18:49]
astridit'll write out 'crop.pgm' which should be cropped to include the content and include the grid it's using for segmentation
oh :(
ok
[18:50]
namespaceThis needs like, a readme.txt [18:50]
astridwell my plan for it involves a rewrite, because proper monospace ocr is a thing the world needs
desperately
[18:50]
namespaceOtherwise I'll just be asking you questions all day. [18:50]
astridas we all know [18:50]
namespaceYes, yes it does. [18:50]
astridyeah :| [18:50]
namespaceHere's my advice.
If you want to make this usable to others.
Make a debian/ubuntu/etc vm.
And set the thing up from scratch, and write down each step as you do.
That's your readme.txt
[18:50]
astridit's not currently intended to be useful, it's a quick hack
yeah
[18:51]
namespaceIt's how I do all my setup readme files and they always come out excellent, whereas people who just write them from memory always end up skipping steps and stuff. [18:51]
astrid"if you want to do monospace ocr, here's a very fiddly thing that gets good results"
i usually wind up with 2 or 3 misrecognized characters per page at most
[18:51]
namespaceYeah that would be incredible. [18:52]
astridit doesn't assume it knows what letters look like
instead it's like "hm, idk what this is, hey user, what is it?" and you say "that's a B, and you can use it as an example of other Bs"
so it builds training data as you go
but i'm planning a rewrite with maintainability and also smartness
next version will lay down a grid over the page that can be skewed and bent, so that photographs of monospace text can be recognized f.ex
and it'll be much much less manual fiddly process
[18:52]
namespaceYes well, the next version is in the indeterminate future and I'd like this to be on the net now. :P [18:54]
astridbecause i have about 500 pages of photographs of typewritten text that i'd like to get into readable form
yes i know :
:)
[18:54]
atrocityi have my work OCR running, it's going VERY slowly, lol [18:58]
namespaceUnsurprising. XD [18:59]
atrocityi feel like i should've just ran it on the first 10 pages or soemthing instead of all 700, lol [19:01]
godaneso now page 36 and 37 of 2014-04-30 of east bay express is giving me 404 [19:02]
***HarryCros has quit IRC (Read error: Connection reset by peer)
HarryCros has joined #archiveteam-bs
[19:02]
godaneok looks there was a middle booklet for 'Bike to work day'
on may 8 that year
after that in continues from page 28
so its really page 8 and 9 of the 'bike to work day' booklet
[19:06]
***HarryCros has quit IRC (Remote host closed the connection)
HarryCros has joined #archiveteam-bs
[19:09]
godaneso there pages 31 to 33 missing from 2014-06-04 [19:10]
atrocityhttp://meddl.com/temp/ocrtest.txt
that's what my work OCR app did, lol
the first 10 pages at least
[19:11]
***HarryCros has quit IRC (Remote host closed the connection) [19:11]
namespaceNot that bad, tbh. But only of about comparable quality to the archive.org OCR scan. [19:12]
atrocityyeah, lol
doesn't help that there's holes all through it
[19:13]
namespaceIt really *really* doesn't.
Like I said, nasty.
[19:13]
atrocitylol [19:15]
***C4K3_ has quit IRC (leaving)
C4K3 has joined #archiveteam-bs
[19:26]
kristian_ has joined #archiveteam-bs
odemg has quit IRC (Read error: Operation timed out)
[19:36]
.... (idle for 15mn)
odemg has joined #archiveteam-bs [19:55]
.......... (idle for 46mn)
bitBaron has joined #archiveteam-bs
bitBaron has quit IRC (Client Quit)
[20:41]
schbiridyeah, why not move the "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD" bs to a specific channel?
if the main channel is too holy to even discuss itself
[20:48]
SanquiI would not give password to this person >> <coltaine_> and I give you sucky sucky [20:51]
schbiridSanqui Sanqui? [20:52]
atrocity$5 [21:02]
astridi'm with Sanqui here [21:07]
***Honno has quit IRC (Read error: Operation timed out)
HarryCros has joined #archiveteam-bs
[21:09]
Pudsey has joined #archiveteam-bs
HarryCros has quit IRC (Remote host closed the connection)
kristian_ has quit IRC (Ping timeout: 370 seconds)
HarryCros has joined #archiveteam-bs
HarryCros has quit IRC (Read error: Connection reset by peer)
HarryCros has joined #archiveteam-bs
[21:20]
Pudsey has quit IRC (Remote host closed the connection)
ZexaronS has joined #archiveteam-bs
kristian_ has joined #archiveteam-bs
[21:38]
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #archiveteam-bs
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #archiveteam-bs
drumstick has joined #archiveteam-bs
[21:50]
joepie91I'm not sure that this is what was meant with "steal from work": https://twitter.com/SeamusHughes/status/900790149017219073
:p
[21:56]
godanei guy put this up on r/opendirectories : http://95.211.186.214/Incoming/ [21:59]
joepie91ohh, lots of oldish stuff there
ha
yeah I'm pretty sure this guy went to... 33C3?
[22:00]
godanei can't grab it but some one here would want it [22:00]
joepie91fairly certain that the Leaks directory came off one of the FTPs there
godane: definitely, thanks :P
[22:00]
godanei got from here: https://www.reddit.com/r/opendirectories/comments/6vri47/large_dj_sets_directory/
i know that SketchCow is looking for older dj sets
[22:01]
joepie91godane: oh, is he? any particular type?
I may have a pile of my own
that is, sets from a specific internet radio channel and some of their live events
(afterhoursdjs.org, but it's stuff that isn't yet in the collection on IA)
[22:02]
godanehe is doing the hip hop mixtapes collection [22:02]
joepie91ah yeah, this is def not hip hop :p [22:03]
godanehe will take it [22:03]
joepie91this is what I have laying around: https://gist.githubusercontent.com/joepie91/1afb987f86a2b417c33a61a35a6c0f29/raw/84a200d991cf846e79133b78e046558b8104a822/gistfile1.txt (cc SketchCow - let me know if you want me to ship them to FOS or such)
haven't gotten around to sorting out the metadata yet
goes back to 2002 in some places :P
[22:05]
..... (idle for 22mn)
godaneuploaded : https://archive.org/details/forum.kingsnake.com-1997-to-2003-archives-20161203:
*uploaded : https://archive.org/details/forum.kingsnake.com-1997-to-2003-archives-20161203
[22:28]
..... (idle for 24mn)
***Stiletto has quit IRC (Read error: Operation timed out) [22:52]
........ (idle for 36mn)
kristian_ has quit IRC (Quit: Leaving) [23:28]
SketchCowThat DJ directory looks like crap. [23:38]
***BlueMaxim has joined #archiveteam-bs [23:40]
Stilett0 has joined #archiveteam-bs [23:52]
TC01 has quit IRC (Remote host closed the connection) [23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)